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This  report  discusses  an  iterative  methodolo- 
gy for  Logical  Database  Design.  The  inethodology 
includes  four  phases:  Local  Information-flow 
Modeling,  Global  Information-flow  Modeling,  Con- 
ceptual Schema  Design,  and  External  Schema  Model- 
ing. These  phases  are  intended  to  make  maximum 
use  of  available  information  and  user  expertise, 
including  the  use  of  a  previous  Needs  Analysis, 
and  to  prepare  a  firm  foundation  for  physical  da- 
tabase design  and  system  implementation.  The 
methodology  recommends  analysis  fran  different 
points  of  view — organization,  function,  and 
event —  in  order  to  ensure  that  the  logical  data- 
base design  accurately  reflects  the  requirements 
of  the  entire  population  of  future  users.  The 
methodology  also  recommends  computer  support  from 
a  data  dictionary  system,  in  order  to  conveniently 
and  accurately  handle  the  volume  and  complexity  of 
design  documentation  and  analysis.  The  report 
places  the  methodology  in  the  context  of  the  com- 
plete system  life  cycle.  An  appendix  of  illustra- 
tions shows  examples  of  how  the  four  phases  of  the 
methodology  can  be  implemented. 

Key  words:  data  dictionary  system;  data  dictionary 
system  standard;  data  management;  data  model;  da- 
tabase design;  database  management  system,  DBMS; 
Entity-Relationship-Attribute  Model;  Information 
Resource  Dictionary  System,  IRDS ;  logical  database 
design. 
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1.  INTRODUCTION 


1.1    What  Is  Logical  Database  Design? 


Logical  Database  Design  (LDD)  is  the  process  of  deter- 
mining the  fundamental  data  structure  needed  to  support  an 
organization/ s  information  resource.  LDD  provides  a  struc- 
ture that  determines  the  way  that  data  is  collected,  stored, 
and  protected  from  undesired  access.  Since  data  collection, 
storage,  and  protection  are  costly,  and  since  restructuring 
data  generally  requires  expensive  revisions  to  programs,  it 
is  important  that  the  LDD  be  of  high  quality.  This  guide 
describes  procedures  that  lead  to  the  development  of  a  high 
quality  LDD. 

A  high  quality  LDD  will  be:  (1)  internally  consistent, 
to  reduce  the  chances  of  contradictory  results  from  the  in- 
formation system;  (2)  complete,  to  ensure  that  known  infor- 
mation requirements  can  be  satisfied  and  known  constraints 
can  be  enforced;  and  (3)  robust,  to  allow  adaptation  of  the 
data  structure  in  response  to  foreseeable  changes  in  the  in- 
formation requirements.  To  fulfill  these  considerations,  a 
good  LDD  should  be  independent  of  any  particular  applica- 
tion, so  that  all  applications  can  be  satisfied,  and  in- 
dependent of  any  particular  hardware  or  software  environ- 
ment, so  that  the  data  structure  can  be  supported  in  any  en- 
vironment. A  good  LDD  will  ensure  that  modularity,  effi- 
ciency, consistency,  and  integrity  are  supported  in  the  data 
structure  underlying  the  databases  of  the  information  sys- 
tem . 


1.1.1  LDD^s  Relation  to  Other  Life  Cycle  Phases. 

LDD  is  closely  related  to  the  life  cycle  phases  of 
Needs  Analysis,  Requirements  Analysis,  and  Physical  Database 
Design.  Needs  analysis  and  requirements  analysis  provide 
the  information  requirements  needed  to  perform  LDD.  LDD 
produces  data  models  and  schemas  for  use  in  physical  data- 
base design.  The  Physical  Database  Design  phase  receives 
the  data  structures  prepared  during  LDD  and  adapts  them  to 
the  specific  hardware  and  software  environment  to  form  the 
internal  schema  of  each  database. 
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Figure  1  shows  LDD's  place  in  the  life  cycle  and  dep- 
icts the  functional  and  data  activities  that  can  be  per- 
formed in  parallel.  LDD  can  be  performed  in  parallel  to  the 
phases  of  Requirements  Analysis,  Systems  Specification,  and 
Systems  Design.  The  synchronized  performance  of  these 
phases  will  assist  in  providing  the  information  needed  for  a 
good  LDD  and  will  result  in  speeding  the  systems  development 
process . 

By  taking  a  brief  overview  of  the  development  of  an  in- 
formation system,  we  can  see  how  LDD  is  used.  The  life  cy- 
cle of  an  information  system  should  consist  of  the  following 
phases : 


1.  Needs  Analysis 

Also  known  as  Enterprise  Analysis,  this  phase  is  con- 
ducted before  other  work  on  the  systems  development 
project  begins.  Its  purpose  is  to  establish  the  con- 
text and  boundaries  of  the  systems  development  ef- 
fort, and  provide  the  focus,  scope,  priorities,  and 
initial  requirements  for  the  target  system. 

2.  Requirements  Analysis 

The  results  of  the  Needs  Analysis  are  carried  further 
in  this  phase,  which  provides  both  the  functional  and 
the  data  requirements  for  the  system  under  develop- 
ment. Requirements  analysis  is  performed  in  parallel 
to  the  LDD  and  Systems  Specification  phases.  Proto- 
typing may  be  performed  during  this  phase  to  refine 
requi  rements . 

3.  Systems  Specification 

During  this  phase,  the  functional  information  provid- 
ed by  requirements  analysis  is  used  to  produce 
specifications  for:  input  and  output  reports  that  are 
both  external  and  internal  to  the  system;  the  func- 
tions, processes,  and  procedures  of  operational  sub- 
systems;  and  decision  support  capabilities. 

4.  Logical  Database  Design 

This  phase  is  performed  concurrently  with  the  phases 
of  Requirements  Analysis,  Systems  Specification,  and 
Systems  Design.  During  this  phase,  the  data  require- 
ments provided  by  the  Needs  Analysis  and  Requirements 
Analysis  phases  are  used  to  perform  the  following 
iterative  data  modeling  and  design  activities: 


-3- 


A.  Local  and  Global  Information-flow  Modeling 

The  following  are  defined:  data  flows  throughout 
the  system;  information  models  for  each  applica- 
tion (i.e.,  local)  and  for  the  entire  system 
(i.e.,  global);  and,  data  classifications,  re- 
quirements, and  sources  for  the  subsystems  in- 
cluding those  for  decision  support.  The  LDD 
data  modeling  activities  correspond  to  the  func- 
tional specification  activities  of  to  the  Sys- 
tems Specification  phase. 

B.  Conceptual  and  External  Schemas 

The  following  are  defined:  data  structures  for 
system-wide  (i.e.,  conceptual)  and  application- 
oriented  (i.e.,  external)  views  of  the  system; 
user  views  of  the  databases  including  those  pro- 
viding decision  support  capabilities;  and  logi- 
cal database  schema  designs  and  constraints. 
LDD  schema  design  activities  correspond  to  the 
functional  design  activities  of  the  Systems 
Design  phase. 

5.  Systems  Design 

This  phase  delineates:  the  functional  control  flows 
using  the  data  flows  from  LDD;  high  level  and  de- 
tailed system  architectures;  the  software  structure 
design;  and  the  module  external  design  (i.e.,  the 
design  for  interfaces  among  modules  of  code) . 

6.  Physical  Database  Design 

This  phase  produces  physical  data  flows  and  the  de- 
tailed internal  schema  for  the  specific  hardware, 
software,  and  database  implementations  to  be  used,  in 
order  to  balance  maximum  data  storage  efficiency, 
data  retrieval  performance,  and  data  update  perfor- 
mance. Physical  database  design  is  performed  in 
parallel  to  the  Implementation  phase. 

7.  Implementation 

This  phase  produces:  logic  definition  for  programs; 
module  design;  internal  data  definitions;  coding; 
testing  and  debugging;  acceptance  testing;  and 
conversion  from  the  old  system  to  the  new  one. 
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8,     Operation  and  Maintenance 


During  this  phase  the  information  system  performs  to 
serve  the  users^  information  needs  and  to  collect 
data  about  the  system's  ongoing  operation.  Program- 
mers and  analysts  continue  to  debug  the  system  and 
modify  it  to  support  changing  users'  needs.  Database 
designers  continue  to  maintain  database  effectiveness 
and  efficiency  during  system  modifications  and  data 
changes.  When  modifications  to  the  system  are  no 
longer  adequate  to  support  user  needs,  the  current 
system  should  evolve  to  a  new  target  system  and  the 
cycle  will  begin  again. 

As  this  description  of  the  information  system's  life 
cycle  shows,  LDD  plays  a  major  role  in  development.  LDD 
greatly  enhances  the  performance  of  the  Quality  Assurance 
(QA)  process,  which  would  be  ongoing  from  the  Systems 
Specification  and  LDD  phases  through  the  Operation  and 
Maintenance  phase.  Because  LDD  emphasizes  the  iterative  ap- 
proach, QA  will  have  many  opportunities  to  check  the  results 
of  one  iteration  against  the  results  of  other  iterations. 
Since  LDD  is  performed  in  parallel  to  the  Requirements 
Analysis,  Systems  Specification,  and  Systems  Design  phases, 
QA  will  be  able  to  compare  both  the  interim  and  final 
results  of  concurrent  phases  to  resolve  any  difficulties 
sooner  than  through  the  traditional  approach.  The  automated 
Data  Dictionary  System  (DDS) ,  described  in  Section  1.2.2, 
should  be  used  during  Requirements  Analysis  and  LDD  to  pro- 
vide immediate,  shared  access  to  data  requirements  and  data- 
base designs,  and  to  support  the  QA  process. 


1.1.2  Characteristics  of  LDD. 

The  potential  benefits  of  LDD  to  the  development  life 
cycle  can  only  be  gained,  however,  through  a  good  quality 
LDD.  For  LDD  to  perform  its  role  well,  the  results  of  the 
logical  design  process  must  have  certain  characteristics.  A 
LDD  should  be: 

o  Independent  of  the  hardware  and  software  environ- 
ment, so  that  the  design  can  be  implemented  in  a 
variety  of  environments  and  so  the  design  will 
remain  relevant  even  if  the  hardware  and  software 
selected  to  support  the  information  system  eventual- 
ly change. 

o  Independent  of  the  implementation  data  model  or  the 
Database    Management     System     (DBMS)    in  use,  so  that 
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the  design  will  apply  to  any  present  or  future  data 
model  or  data  inanagement  system,  which  would  not 
necessarily  be  a  DBMS. 

o  Comprehensive  in  representing  present  and  future  ap- 
plications so  that  all  known,  anticipated,  and  prob- 
able needs  can  be  included  or  considered  in  the 
design,  to  avoid  costly  system  alterations  in  the 
future . 

o  Able  to  satisfy  the  information  requirements  of  the 
entire  organization,  encompassing  all  possible  ap- 
plications rather  than  being  limited  to  one  or  two; 
this  way  the  information  system  will  have  the  capa- 
city to  be  an  organizational  resource,  not  just  the 
resource  of  one  department  or  application  area. 

A  good  LDD  should  also  fulfill  a  set  of  precise  techni- 
cal goals  to  provide  a  firm  foundation  for: 

o  Maintainability  and  reusability,  achieved  through 
the  use  of  modularity  in  the  database  design. 

o  Robustness,  allowing  both  the  design  and  the  system 
to  be  adaptable  to  hardware  and  software  changes. 

o  Security,  controlled  through  compar tmentalization  in 
the  database  design  which  will  limit  specified  types 
of  data  access  to  designated  personnel  or  organiza- 
tional units. 

o  Update  and  storage  efficiency,  achieved  through  con- 
trolled redundancy  that  limits  the  number  of  places 
where  the  same  data  will  be  stored. 

o  Retrieval  efficiency,  so  that  data  can  be  organized 
to  be  readily  accessible  by  system  users. 

o  Consistency  and  integrity,  achieved  through  several 
measures  including  data  integrity  constraints  and 
controlled  redundancy. 

If  done  correctly,  logical  database  design  for  a  com- 
plex information  system  is  a  massive  undertaking.  The 
short-term  cost  of  LDD  is  great,  but  the  long-term  benefits 
of  better  information  and  greater  flexibility  provide  sub- 
stantial savings  over  the  system's  life  cycle. 
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1.2    An  Ideal  Logical  Database  Design  Methodology 


A  methodology  is  an  organized  system  of  practices  and 
procedures  applied  to  a  branch  of  knowledge  to  assist  in  the 
pursuit  of  that  knowledge,  which  in  this  case  is  database 
design.  In  other  words,  a  LDD  methodology  is  a  planned  ap- 
proach to  database  design  that  assists  in  database  develop- 
ment in  support  of  an  information  system. 


1.2.1  LDD  Practices. 

This  guide  describes  a  methodology  that  includes  the 
preferred  practices  and  procedures  characterizing  the 
development  of  a  good  quality  LDD  and  a  successful  informa- 
tion system.  Although  normalization  is  often  considered  the 
primary  activity  of  LDD,  normalization  is  only  one  of  many 
procedures  performed  in  LDD.  Normalization  is  a  valuable 
but  limited  tool  in  that  it  only  considers  functional  data 
dependencies.  Other  procedures  should  be  used  in  conjunc- 
tion with  normalization  for  a  coherent  database  design.  An 
ideal  LDD  methodology  should  be  supported  by: 


1.  A  LDD  guide,  such  as  the  one  provided  in  this  docu- 
ment, that  describes  clearly  defined  steps  for 
analysts  and  designers  to  follow  in  order  to  produce 
a  good  LDD. 

2.  Analytical  methods,  such  as  the  ones  described  in 
this  guide,  to  assist  in  the  detection  of  redundan- 
cies, incompleteness,  and  possible  errors  in  the  con- 
ceptual and  functional  data  modeling.  Some  of  these 
methods  include:  (a)  a  hierarchical,  iterative  ap- 
proach to  organizational  or  functional  concept 
development;  (b)  differentiation  of  various  points  of 
view  in  information  development,  such  as  organiza- 
tional components,  higher  and  lower  level  functions, 
and  event,  control,  and  decision  structures;  and  (c) 
normalization  procedures. 

3.  A  series  of  specified  checkpoints  for  progress  re- 
views by  designers  and  management,  and  for  informa- 
tion exchange  meetings  with  the  personnel  of  LDD^s 
parallel  phases.  Requirements  Analysis,  Systems 
Specification,  and  Systems  Design. 
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4.  A  mode  of  notation  (i.e.,  graphic  or  symbolic)  to 
describe  and  build  a  detailed  conceptual  model  of  the 
data  and  functions  under  study. 

5.  A  specification  language  (e.g.,  the  language  used  by 
a  Data  Dictionary  System)  to  specify  information  re- 
quirements and  the  LDD  design  in  a  consistent,  unam- 
biguous manner . 

6.  An  automated  tool  such  as  a  Data  Dictionary  System, 
capable  of  supporting  the  documentation  and  analysis 
of  LDD  complexity,  especially  for  large  systems 
development  projects.  This  tool  should  be  used  to 
assist  in:  (a)  describing  the  conceptual  model;  (b) 
describing  the  data  needed  to  support  the  functions 
of  the  conceptual  model;  (c)  performing  completeness 
and  consistency  checking  of  the  conceptual  model  and 
the  data  needed  to  support  the  functions  of  the  con- 
ceptual model    [AFIF84] . 


1.2.2  Data  Dictionary  System. 

A  Data  Dictionary  System  (DDS)  is  a  computer  software 
system  used  to  record,  store,  protect,  and  analyze  descrip- 
tions of  an  organization's  information  resources,  including 
data  and  programs.  It  provides  analysts,  designers,  and 
managers  with  convenient,  controlled  access  to  the  summary 
and  detailed  descriptions  needed  to  plan,  design,  implement, 
operate,  and  modify  their  information  systems.  The  DDS  also 
provides  end-users  with  the  data  descriptions  that  they  need 
to  formulate  ad  hoc  queries.  Equally  important,  it  provides 
a  common  language,  or  framework,  for  establishing  and  en- 
forcing standards  and  controls  throughout  an  organization. 

The  data  dictionary  (DD)  is  the  data  that  is  organized 
and  managed  by  the  Data  Dictionary  System.  The  DD  is  a 
resource  that  will  be  of  great  value  long  after  a  logical 
database  design  is  completed.  The  data  dictionary  can  pro- 
vide support  for  information  about  all  aspects  of  system 
development  to  be  stored,  updated,  and  accessed  throughout 
the  system's  life  cycle. 

The  term  Information  Resource  Dictionary  System  (IRDS) 
is  beginning  to  replace  the  term  Data  Dictionary  System  due 
to  recognition  of  the  flexibility  and  power  of  the  software 
[ANSI84,  FIPS80,  KONI84] .  This  paper  uses  the  terms  Data 
Dictionary  System  (DDS)  and  data  dictionary  (DD)  to  conform 
to  the  current  practice  of  software  vendors. 
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1.3     Intended  Audience  for   this  Guide 


This  guide  is  intended  primarily  to  provide  information 
and  guidance  to:  Data  Administrators  (DAs)  and  Database  Ad- 
ministrators (DBAs)  in  leading  their  LDD  projects;  Applica- 
tions Administrators  (AAs)  and  application  specialists  in 
the  types  of  data  and  data  validation  that  LDD  will  require; 
and,  end-users  and  systems  analysts  in  how  they  can  best 
contribute  to  the  LDD  project  to  maximize  its  benefits. 


1.4    Purpose  of  this  Guide 

This  guide  provides  a  coherent  plan  of  action  that  will 
allow  management  and  database  designers  to  direct  and  per- 
form the  database  design  successfully.  The  LDD  plan  offered 
here  is  sufficiently  general  to  be  compatible  with  existing 
tools  and  techniques  in  use  for  database  design.  By  defin- 
ing a  methodology  that  provides  a  more  stable  view  of  the 
relationships  among  data  items,  this  guide  can  be  used  to 
increase  the  effectiveness  of  an  inform.ation  system  over  its 
life  cycle. 

When  the  LDD  approach  described  here  is  used,  particu- 
larly if  used  with  the  assistance  of  a  Data  Dictionary  Sys- 
tem, an  increase  in  clear  communication  can  result  among  the 
end-users,  systems  analysts,  designers,  and  the  applications 
programmers  who  will  actually  code  and  implement  the  system. 
By  providing  a  detailed  and  unambiguous  description  of  the 
system^s  information  requirements  in  relation  to  the  users^ 
perspectives,  LDD  offers  a  bridge  between  the  end-users  and 
the  physical  database  designers  and  applications  program- 
mers. 

This  guide  describes  a  methodology  to  be  used  in  optim- 
izing the  flexibility  and  integrity  of  an  information  sys- 
tem. Flexibility  will  be  ensured  through  the  identification 
of  the  least  changing  characteristics  of  the  system,  which 
give  a  stable  foundation  upon  which  to  build  the  information 
system.  Data  integrity  will  be  optimized  through  the  cen- 
tralized control,  completeness,  and  consistency  that  a  qual- 
ity LDD  will  provide.  The  information  system  that  results 
from  these  LDD  procedures  will  perform  better  over  the 
system's  life  cycle  because  it  will  address  current  and 
probable  future  needs  more  completely  and  will  allow  re- 
quirements changes  to  be  incorporated  more  effectively. 
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1.5  Assumptions 


Several  assumptions  have  been  made  in  the  preparation 
of  this  guide  about  the  types  of  information  systems  in 
which  LDD  will  be  used.  Because  LDD  is  a  non-trivial  pro- 
cess to  be  undertaken  when  a  need  for  it  exists,  it  is  as- 
sumed that: 


o  The  information  system''s  databases  will  be  sizable 
and  complex  to  support  multiple  applications,  may 
have  no  single  dominant  application,  and  will  prob- 
ably contain  tens  or  hundreds  of  data  collections  and 
relationships,  and  thousands  of  data  elements.  DBMS 
support  is  not  assumed,  although  it  is  usually  desir- 
able. 

o  The  information  system  and  its  databases  are  intended 
for  use  over  a  long  period  of  time  so  that  the  bene- 
fits to  the  life  cycle  costs  will  justify  the  invest- 
ment of  time,  money,  and  effort  in  LDD. 

o  The  data  requirements  of  the  information  system  will 
be  significant  and  include  the  use  of  ad  hoc  queries 
where  the  precision  of  the  database  structure  will 
prove  important. 


1.6    Scope  of  this  Guide 


This  guide  is  limited  in  scope  to  the  LDD  phase.  The 
interaction  of  LDD  with  the  immediately  preceding  and  subse- 
quent life  cycle  phases  is  mentioned,  since  these  determine 
LDD^s  information  resources  and  products.  Because  LDD  works 
from  the  results  of  the  preceding  Needs  Analysis  and  con- 
current Requirements  Analysis  phases,  and  prepares  a  founda- 
tion for  the  subsequent  Physical  Database  Design  phase, 
these  phases  will  be  described  briefly. 
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1.7    Structure  of  this  Guide 


Chapter  2  addresses  the  relationship  between  LDD  and 
the  phases  of  Needs  Analysis,  Requirements  Analysis,  and 
Physical  Database  Design.  The  major  phases  of  the  LDD  ap- 
proach are  further  discussed  along  with  the  types  of 
analysis  strategies  that  will  be  needed  to  accompany  LDD. 
Figure  2,  in  Section  2.2.2,  illustrates  the  interaction  of 
the  four  phases  of  the  LDD  methodology  to  assist  the  reader 
in  visualizing  the  LDD  process. 

In  Chapter  3,  the  organizational  aspects  of  the  LDD 
project  are  described,  including  the  key  roles  in  LDD 
development,  the  training  required  for  the  personnel  in 
these  roles,  and  the  part  played  by  management  in  planning 
for  and  monitoring  the  LDD  process. 

The  following  chapters,  4  through  7,  define  the  four 
phases  of  the  LDD  approach  in  detail.  Chapters  4  through  7 
are  identically  structured  so  that  each  chapter  has  three 
sections:  (1)  the  first  section  of  each  phase  discusses  the 
information  used  by  that  phase,  (2)  the  second  section 
discusses  the  general  functions  of  that  phase,  and  (3)  the 
third  section  discusses  the  procedure  for  accomplishing  that 
phase.  The  third  section  of  each  phase  includes  a  diagram 
of  the  steps  within  that  phase,  followed  by  a  subsection  on 
each  step.     Each  step  is  followed  by  a  summary  chart. 

Chapter  4  discusses  Local  Information- flow  Modeling  and 
describes  three  modes  of  analysis  corresponding  to  the  tar- 
get system's  (1)  organizational  components,  (2)  functions, 
and  (3)  the  events  to  which  the  target  information  system 
will  respond.  These  three  analysis  modes  are  examined  in 
relation  to  data  flow  and  data  structure  design  techniques. 

Chapter  5  addresses  Global  Information-flow  Modeling 
and  emphasizes  the  need  to  balance  the  perspectives  of  data 
flow  and  data  structure  in  the  development  of  a  design  that 
will  favor  both  equally.  The  Conceptual  Schema  Design  is 
described  in  Chapter  6  in  relation  to  the  use  of  Entity- 
Relationship-Attribute  (E-R-A)  data  modeling  diagrams  and 
normalization  techniques.  Chapter  7  defines  External  Schema 
Modeling  (i.e.,  subschema  modeling)  as  it  reflects  the  data 
structure  and  data  flow  from  the  end-user's  perspective  in 
the  development  of  workload  specifications  for  physical  da- 
tabase design. 
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A  glossary  of  acronyms  used  in  this  guide  is  included 
at  the  beginning  of  the  document  for  reference.  An  appendix 
of  examples  has  been  included  at  the  end  of  the  document  to 
illustrate  the  types  of  graphics  that  will  be  used  and 
analysis  that  will  occur  during  the  four  phases  of  LDD. 
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2.     THE  FRAMEWORK  THAT  SUPPORTS  LDD 


LDD  plays  an  important  part  in  the  life  cycle  of  the 
information  system.  This  chapter  describes:  (1)  the  rela- 
tionship between  the  database  design  and  the  functioning  of 
the  information  system;  (2)  the  interactions  between  LDD  and 
the  Needs  Analysis,  Requirements  Analysis,  and  Physical  Da- 
tabase Design  phases;  (3)  the  information  requirements  need- 
ed to  perform  LDD;  (4)  the  phases  within  LDD;  and  (5)  stra- 
tegies for  LDD  development  and  their  impact. 


2.1     The  Role  of  LDD  in  the  Life  Cycle  j 

J 

LDD  defines  the  data  structure  that  supports  the    data-  | 
bases  of  an  information  system.     The  database  system  and  the 
information  system  are  inextricably     linked,     but     they  are 

different.  ; 

•I 

An  information  system  is  one  or  more  multi-purpose  com-  i 

puter     systems     that     may    be  supported  by  a  network  through  j 

which  many  types  of  users,  perhaps  in    different     locations,  ] 

update,     query,     and     provide  data  to  the  system  in  order  to  j 

have  current  information  available  on  a  variety    of     topics.  | 

Decision  support  capabilities  may  be  incorporated  in  the  in-  (| 
formation  system's  structure    to    assist     end-users     in  the 

decision-making  process.  i 

A  database  is  a  component  of  an  information  system     and  j 

may    contain    a     variety  of  general  and  detailed  information  j 

that  is  made  available  to  the  information  system's  end-users  n 
through      queries.       The    information     system's     ability  to 

respond  to  user's  queries  is  directly  related  to  logical  da-  i 

tabase  design.  | 

The  design  of  the  information  system's    databases    will  ] 

determine     the    ways     in    which     the  information  system  will  i 

function.     If  the  information  system     will     be     required     to  | 

answer     ad  hoc  queries,  the  data  structures  within  the  data-  ] 

bases  should  be  modeled  to  provide    maximum     flexibility     in  j 

data  accessibility  and  retrieval.  If  the  system  will  be  re-  j 
quired  to   respond  quickly    to     certain    predefined  queries, 

then     the     structural  modeling  should  be  constructed  to  sup-  • 
port  rapid  retrieval  performance,  which  will     generally  re- 
quire     indexes     or     redundant  data.     If  the  time  and  expense 
needed  to  update  the  data  in  the  system  are  of  paramount  im- 
portance,    then     ease     in     locating  and  changing  data  values 
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should  be  stressed  in  the  database  design.  If  the  storage 
cost  of  large  databases  is  a  primary  consideration,  then  the 
minimization  of  physical  redundancy  should  be  emphasized  in 
the  database  design. 

Usually  a  combination  of  such  requirements  exist  for  an 
information  system,  with  conflicting  implications  for  the 
design  of  the  underlying  databases.  These  requirements  and 
their  implications  for  the  databases  that  support  the  infor- 
mation system  are  defined  during  the  LDD  phase,  and  their 
conflicts  are  resolved  during  the  Physical  Database  Design 
ph  as  e . 

The  structure  of  the  logical  design  of  the  database 
plays  a  crucial  role  in  determining  the  capabilities  and 
performance  of  an  information  system.  A  good  physical  data- 
base design  cannot  be  developed  without  adequate  prepara- 
tion. A  good  logical  database  design  prepares  the  ground- 
work for  a  quality  physical  database  design  and  a  successful 
system  implementation. 

The  phases  of  Needs  Analysis,  Requirements  Analysis, 
Logical  Database  Design,  and  Physical  Database  Design  are 
closely  linked.  The  ability  to  perform  the  subsequent 
phases  is  determined  by  the  performance  of  the  previous  and 
parallel  phases.  Each  of  these  phases  must  be  performed 
well  for  the  resulting  database  to  represent  the  desired 
system  accurately.     These  phases  are  described  below. 


2.1.1  Needs  Analysis. 

As  we  have  seen  in  Chapter  1,  a  Needs  Analysis 
describes  the  primary  needs  a  new  information  system  should 
fulfill.  Without  this  formal  expression  of  the 
organization's  perception  of  its  needs,  the  analysts  and 
designers  will  have  to  work  from  their  own  assumptions  of 
the  information  system's  purposes.  Their  assumptions  could 
unknowingly  conflict  with  the  organization's  vaguely 
described  or  unstated  purposes.  The  resulting  lack  of  clar- 
ity in  direction  would  be  costly. 

A  specific  Needs  Analysis  methodology  should  be  adopted 
and  used  by  an  organization  previous  to  undertaking  any  ex- 
tensive systems  development  project.  The  use  of  a  well- 
defined  methodology  assures  that  most,  if  not  all,  of  the 
important  questions  about  the  purpose  of  the  proposed  system 
will  have  been  asked  and  answered  at  the  end  of  the  Needs 
Analysis  phase.  One  of  the  most  familiar  and  extensively 
used  Needs  Analysis  methodologies  available  at  this  time  is 
IBM's  Business  Systems  Planning   (BSP)   approach  [MART82]. 
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In  the  Needs  Analysis  methodology  adopted,  the  follow- 
ing minimum  set  of  questions  should  be  posed: 


1.  What  organizational  problems  require  a  solution  that 
the  target  information  system  could  effect? 

2.  What  new  or  improved  information  is  needed  to  perform 
what  types  of  functions? 

3.  What  are  the  boundaries  and  interfaces  of  the  target 
system? 

4.  What  possible  improvements  in  information  availabili- 
ty could  be  expected  from  the  target  information  sys- 
tem? The  following  are  goals  of  many  system  develop- 
ment projects: 


o    Greater  accuracy  of  information, 
o    Improved  timeliness, 
o    Better  end-user  interfaces, 
o     Improved  privacy  and  security, 
o    Rapid  access  to  distant     information  centers 
by  information  sources  and  end-users. 


Once  a  Needs  Analysis  methodology  has  been  adopted  and 
these  types  of  questions  have  been  answered  in  detail,  the 
purposes  and  plans  for  the  systems  development  project  can 
be  made  available  to  the  systems  development  personnel.  If 
the  Needs  Analysis  has  been  performed  well  and  a  comprehen- 
sive methodology  has  been  used,  sufficient  information  has 
probably  been  collected  for  LDD  to  begin.  Close  coordina- 
tion with  the  Requirements  Analysis  phase  is  needed  for  LDD 
to  continue. 


2.1.2  Requirements  Analysis. 

The  requirements  analysis  effort  will  verify  and  sup- 
plement the  results  of  the  Needs  Analysis  phase.  Since  LDD 
and  Systems  Specification  are  directly  supported  by  the  con- 
current Requirements  Analysis  phase,  it  is  critical  that  the 
procedures  and  performance  of  requirements  analysis  be 
planned  carefully  to  coordinate  with  these  other  phases. 

The  Requirements  Analysis  phase  will  involve  two  types 
of  analysis:  (1)  analysis  of  the  types  of  data  and  data 
flows  needed  within  the  organization;  and  (2)  analysis  of 
the     functions     performed  within  the  organization  which  will 
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require  the  use  of  this  data.  The  purpose  of  requirements 
analysis  is  to  provide  data  requirements  to  support  the  LDD 
phase,  and  functional  requirements  to  support  the  Systems 
Specification  phase. 

Requirements  analysts  verify  which  functions  and  sub- 
systems will  remain  external  to  the  system,  and  require  in- 
terfaces. By  defining  the  information  products  of  external 
subsystems  or  systems  that  are  inputs  to  the  target  system, 
and  by  defining  the  information  products  of  the  target  sys- 
tem that  are  used  by  external  subsystems  or  systems,  the 
analysts  can  designate  the  high  level  input/output  transfor- 
mations of  information  that  must  take  place  within  the  tar- 
get system.  The  specific  functions  and  subfunctions  per- 
formed within  the  target  system  are  logically  organized  and 
described.  Further,  the  analysts  define  the  known  con- 
straints on  accuracy,  timeliness,  and  other  performance  re- 
quirements, which  will  be  further  defined  in  LDD.  Once  gen- 
eral requirements  have  been  described,  further  refinements 
of  the  requirements  are  developed.  Prototyping  may  be  used 
in  conjunction  with  the  LDD  and  Systems  Specification  phases 
to  refine  and  model  requirements. 

As  requirements  are  defined,  the  information  may  be 
stored  in  the  form  of  a  data  dictionary  to  be  manipulated  by 
a  Data  Dictionary  System.  The  use  of  a  DDS  will  provide  au- 
tomated support  for  the  storage,  analysis  and  querying  of 
data,  for  the  definition  and  presentation  of  technical  and 
management  reports,  and  for  the  simultaneous  access  of  re- 
quirements information  for  use  in  concurrent  phases.  Re- 
quirements information  stored  in  a  data  dictionary  can  be 
supplemented  with  information  from  LDD  and  other  phases,  and 
can  be  maintained  for  on-line  use  throughout  the  system^s 
life  cycle. 


2.1.3  Logical  Database  Design. 

The  LDD  designers  decide  which  data  must  be  stored  and 
maintained  to  support  the  functions  and  subfunctions  of  the 
target  system.  By  abstracting  from  the  functions  to  the 
data  structures,  the  designer  defines  the  data  objects  to  be 
modeled  and  decides  which  properties  and  constraints  are 
relevant  in  modeling  these  objects.  The  Conceptual  Schema 
is  the  primary  product  of  LDD. 

The  Entity-Relationship-Attribute  modeling  technique 
has  been  chosen  to  define  the  LDD  data  structure  (see 
Chapter  6) .  Organizations  that  prefer  other  equivalent  data 
modeling  techniques  may  easily  adapt  this  LDD  methodology  to 
those  techniques. 
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An  important  consideration  for  LDD  is  to  ensure  that 
all  information  required  from  the  LDD  phase  is  developed  and 
provided  to  the  Physical  Database  Design  phase  at  the  ap- 
propriate time.  This  information  required  from  LDD  includes 
the  volume  of  data,  the  priority  and  frequency  of  the  logi- 
cal access  paths  to  be  implemented  in  the  physical  database, 
and  constraints  on  performance,  integrity,  security,  and 
pr  ivacy . 


2.1.4  Physical  Database  Design. 

The  first  step  of  the  Physical  Database  Design  phase  is 
to  select  the  appropriate  data  model  (e.g.,  relational,  net- 
work, or  hierarchical)  and  the  data  management  system  to 
support  it.  This  selection  may,  unfortunately,  be  dictated 
by  the  software  that  the  organization  is  currently  using,  or 
by  the  availability  of  software  for  hardware  that  has  al- 
ready been  procured.  Preferably,  the  data  model  and  the 
data  management  system  will  be  selected  to  match  the  re- 
quirements defined  by  the  LDD  Conceptual  Schema  and  the 
workload.  A  useful  reference  in  the  selection  process  is 
[GALLS 4] . 

The  second  step,  once  the  selection  has  been  made,  is 
to  translate  the  Entity-Relationship-Attribute  model  from 
the  Conceptual  Schema  into  the  selected  data  model.  This 
translation  is  a  rather  simple  matter  for  the  relational 
model:  entities  become  tables,  relationships  are  implemented 
by  means  of  foreign  keys,  and  attributes  become  columns. 
The  network  model  translation  is  not  much  more  difficult: 
entities  become  records,  relationships  become  sets  or  re- 
peating groups,  attributes  become  data  items,  and  attributes 
are  omitted  from  a  member  record  if  they  are  in  the  owner. 
The  hierarchical  model  is  difficult:  entities  become 
records,  attributes  become  data  items,  but  relationships  may 
become  either  true  hierarchical  relationships  or  logical 
children.  These  translations  are  discussed  in  detail  in 
[CHEN82]   and  papers  referenced  therein. 

The  next  step  is  to  develop  a  detailed  physical  data 
structure,  including  the  development  of  indexes  and  other 
access  paths,  detailed  record  structures  (perhaps  combining 
the  logical  records  to  reduce  physical  accesses),  loading 
factors,  and  so  on.  Detailed  methodologies  are  discussed  in 
[CARLS  0,   CARLS  1,  MARC7  8]. 
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2.2     Detailed  Framework  for  LDD 


The  information  requirements  needed  for  the  performance 
of  LDD  are  described  in  Section  2.2.1.  Although  LDD  has 
previously  been  presented  as  a  single  phase  within  the  in- 
formation system  life  cycle,  in  Section  2.2.2  LDD  will  now 
be  subdivided  into  four  simpler  phases  to  be  performed 
iteratively.  Strategies  for  analysis  and  the  information 
requirements  of  these  phases  will  be  described  in  detail  in 
Section  2.2.3. 


2.2.1  LDD  Information  Requirements. 

In  addition  to  information  obtained  from  Needs 
Analysis,  LDD  designers  will  need  other  information  to  be 
collected  and  analyzed  during  the  Requirements  Analysis 
phase,  conducted  in  parallel  to  LDD  and  Systems  Specifica- 
tion. The  following  information  must  be  available  to  LDD 
designers: 


o  Predefined  constraints  on  the  system,  such  as  the  use 
of  existing  hardware  or  software,  the  need  to  convert 
an  existing  system,  and  the  scope  of  the  projected 
information  system. 

o  Project  constraints,  such  as  the  amount  of  time,  mo- 
ney and  personnel  allocated  by  the  organization  for 
the  development  project. 

o  Processing  requirements,  such  as  the  type  of  func- 
tions that  the  information  system  will  be  expected  to 
perform,  and  the  general  application  areas  that  it 
will  be  expected  to  support. 

o  Organizational,  functional  and  data  subsets,  such  as 
departments,  types  of  actions,  and  types  of  informa- 
tion that  the  target  system  will  be  expected  to  sup- 
ply or  support. 

o  Performance  requirements,  such  as  maximum  retrieval 
and  update  times. 

o  Capacity  requirements,  such  as  the  number  of  data  ob- 
jects within  the  target  system,  and  storage  restric- 
tions if  the  limitations  of  existing  hardware  are  ap- 
plicable. 
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o  Data  integrity  requirements,  such  as  the  control 
needed  over  redundant  data,  and  the  need  for  automat- 
ed integrity  checks  during  data  input  and  update,  in- 
cluding edit  and  validation  rules. 

o  Security  and  privacy  requirements,  such  as  the  need 
for  encryption  for  some  types  of  data,  or  the  limita- 
tion of  access  for  certain  types  of  data  to  specific 
personnel . 

o  Reliability  and  maintainability  requirements  that  de- 
fine the  need  for  the  continuous  functioning  of  the 
system . 

o  Distributed  processing  and  data  requirements,  such  as 
the  need  for  network  connections  among  databases  in 
multiple  locations,  or  the  need  for  shared  or  repli- 
cated data  in  multiple  locations. 


2.2.2  LDP  Phases. 

As  we  have  seen  from  Chapter  1,  LDD  generally  involves 
information  modeling  and  database  design  that  are  largely 
hardware  and  software  independent.  LDD  focuses  attention  on 
the  subsystems  that  generate  the  information  comprising  the 
target  system.  Throughout  the  phases  of  LDD,  each  subsystem 
is  examined  and  described  in  terms  of:  (1)  the  organization- 
al components,  (2)  the  application  areas  or  functions,  and 
(3)  the  events,  which  occur  within  or  affect  that  subsystem. 
The  number  and  type  of  these  subsystems  to  be  analyzed  dur- 
ing each  phase  of  LDD  will  depend  on  the  type  of  analysis 
strategy  selected,  as  described  in  Section  2.2.3. 

LDD  consists  of  four  distinct  phases  during  which  all 
the  subsystems  within  the  system,  the  data  flows,  data 
structures,  and  user  views  of  the  databases  are  described. 
These  phases  are  performed  iteratively  and  in  sequence  until 
the  LDD  is  completed.  The  phases  of  LDD  are  the  subject  of 
this  paper  and  are  described  more  fully  beginning  at  Chapter 
4.     In  brief,  the  four  phases  of  LDD  are: 


1.     Local  Information-flow  Modeling 

During  this  phase,  data  flows  are  modeled  for  indivi- 
dual subsystems  within  the  target  system,  including 
each  organizational  component,  function,  and  event. 
Subsystems  are  modeled  one  at  a  time.     A  data  flow  is 
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the  information  that  is  exchanged,  or  "flows,"  within 
and  between  subsystems.  Data  is  defined  at  a  general 
rather  than  specific  level,  in  terms  of  general  for- 
mats or  packages  (e.g.,  all  the  data  contained  within 
a  particular  type  of  report).  The  products  of  this 
phase  are  Local  Information-flow  Models    (LIMs) . 

2.  Global  Information-flow  Modeling 

During  this  phase,  individual  data  flows  are  combined 
and  global  data  flows  are  modeled  for  collections  of 
individual  subsystems  (i.e.,  organizational  com- 
ponents, applications,  or  events)  viewed  as  a  whole. 
Data  will  continue  to  be  viewed  at  the  format  or 
package  level.  The  products  of  this  phase  are  Global 
Information-flow  Models   (GIMs)  . 

3.  Conceptual  Schema  Design 

During  this  phase,  the  data  within  the  data  flows, 
defined  in  the  previous  phases,  is  abstracted  from 
the  packages  in  which  it  resides,  and  defined  in 
terms  of  its  functional  use.  The  data  is  described 
in  terms  of:  (a)  entities,  the  basic  data  components; 
(b)  relationships,  the  ways  in  which  entities  are  as- 
sociated with  each  other  or  share  characteristics; 
and  (c)  attributes,  the  data  that  describes  the  data 
entities.  Entity-Relationship-Attribute  (E-R-A)  di- 
agrams may  be  used  as  an  analysis  method.  The  E-R-A 
abstraction  provides  the  basis  for  a  conceptual  data 
structure.  The  products  of  this  phase  are  Conceptual 
Schemas    (CSs)  . 

4.  External  Schema  Modeling 

During  this  phase,  the  conceptual  schema  is  adapted 
to  conform  to  the  needs  of  the  application  areas 
within  the  information  system.  By  modeling  the  data 
from  the  user's  perspective,  the  designer  is  able  to 
verify  the  Conceptual  Schema  and  derive  a  structured 
user'*s  view  of  the  data.  The  products  of  this  phase 
are  External  Schemas  (ESs)  and  are  also  known  as 
subschemas . 


Figure  2  depicts  the  iterative  relationship  of  the  four 
>D  phases.  The  vertical  line  through  the  center  indicates 
division  between  the  phases  on  the  left  that     are  oriented 
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DIAGRAM  OF  THE  FOUR  LDD  PHASES 
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toward  a  specific  application  (e.g.,  toward  one  organiza- 
tional component,  function,  or  event),  and  those  phases  on 
the  right  that  are  oriented  toward  organizing  these  specific 
applications  into  areas  of  general  interest. 

The  horizontal  line  across  the  diagram  indicates  a 
division  between  the  upper  phases  that  are  oriented  toward 
the  performance  of  functions  and  the  dynamic  data  flow  among 
these  functions,  and  the  lower  phases  that  are  oriented  to- 
ward relatively  static,  shared  data  structures. 

At  the  top  of  the  diagram.  Needs  Analysis  and  Require- 
ments Analysis  indicates  that  these  phases  provide  informa- 
tion to  LDD.  The  results  of  Needs  Analysis  may  be  suffi- 
cient to  begin  the  initial  iterations  of  the  LIM  and  GIM 
phases,  particularly  if  the  Business  Systems  Planning  (BSP) 
methodology  has  been  used.  Subsequent  iterations  will  re- 
quire further  information  from  the  Requirements  Analysis 
phase. 

The  diagram  in  Figure  2  should  be  read  clockwise,  be- 
ginning at  Local  Information- flow  Modeling  (LIM) ,  where  data 
flows  are  modeled.  In  Global  Information-flow  Modeling 
(GIM),  the  individual  data  flows  from  LIM  are  combined  into 
global  data  flows.  These  are  abstracted  to  the  underlying 
shared  entities,  relationships  and  attributes  in  the  Concep- 
tual Schema  (CS) .  Parts  of  the  CS  are  then  extracted  to 
form  each  External  Schema  (ES) ,  which  is  a  particular  user's 
view  of  the  shared  data.  At  this  point,  each  ES  is  then 
compared  with  the  appropriate,  previously  developed  LIM,  to 
ensure  that  the  data  required  by  the  LIM  has  been  included 
in  the  ES  view.  When  errors  are  detected  in  this  comparis- 
on, the  ES,  and  possibly  the  CS ,  will  require  modification. 
The  workload  data  that  was  originally  developed  for  the  LIM 
is  translated  into  operations  on  data  in  the  ES.  Finally, 
the  workload  data  and  the  CS  are  passed  on  to  the  next  life 
cycle  phase.  Physical  Database  Design,  for  the  development 
of  the  internal  schema. 


2.2.3  Strategies  for  LDD  Development. 

Several  analysis  strategies  are  possible  in  approaching 
LDD.  The  choice  of  the  strategy  will  depend  on  the  type  of 
system  to  be  developed  and  the  definition  of  the  data  that 
will  need  to  be  integrated  in  its  design.  The  scope  of  the 
data  can  be  described  as  horizontal  and  the  level  of  detail 
as  vertical.  The  system  can  be  viewed  horizontally  in  the 
breadth  of  functions  that  the  information  system  will  sup- 
port.      If     the    system    will  provide  many  functions  to  many 
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departments  or  locations,  then  the  system  and  its  data  will 
have  a  broad,  horizontal  scope.  If  the  system  performs  few 
functions  but  performs  them  in  great  detail,  then  the  system 
and  its  data  will  have  a  depth  of  detail.  A  large  system 
will  generally  include  both  a  breadth  of  scope  and  a  depth 
of  detail.  Three  possible  strategies  for  approaching  the 
logical  design  phases  are  described,  with  their  ramifica- 
tions for  system  development  success.  Refer  to  Figure  2  in 
following  the  sequence  of  LDD  procedures  for  the  following 
strategies.     The  three  strategies  for  approaching  LDD  are: 


1.  Breadth  First. 

In  this  strategy,  a  large  number  of  Local 
Information-flow  Models  (LIMs)  will  be  developed  at 
first,  but  in  limited  detail.  The  LIMs  will  then  be 
consolidated  into  one  Global  I nf ormat ion- f low  Model 
(GIM)  with  a  broad  scope  but  limited  detail.  One  or 
more  Conceptual  Schemas  (CSs)  will  be  developed  with 
broad  scope  but  limited  detail.  The  External  Schemas 
(ESs)  extracted  from  the  CS  will  provide  quality  con- 
trol and  structure  for  the  next  iteration  of  LIM. 
The  LDD  phases  will  be  repeated  for  the  various  sub- 
systems, adding  greater  detail  for  each  LIM,  until 
the  data  element  level  is  reached.  This  strategy  is 
analogous  to  top-down  system  design. 

Impact;  This  strategy  is  appropriate  for  the  develop- 
ment of  very  large,  very  complex  information  systems, 
where  a  great  depth  and  breadth  of  data  must  be  in- 
tegrated through  the  development  process. 

2.  Depth  First . 

In  this  strategy,  a  small  number  of  LIMs  will  be 
developed  through  iterations  of  the  LDD  phases  to  the 
data  element  level.  The  LIMs  will  be  consolidated 
into  a  GIM  having  depth  of  detail  but  a  limited  hor- 
izontal scope.  A  small  number  of  ESs  will  be 
developed,  again  with  depth  of  detail  but  limited 
scope.  Further  iterations  of  the  entire  process  are 
developed  until  the  desired  horizontal  scope  is  at- 
tai  ned . 

Impact:  This  strategy  is  inappropriate  for  the 
development  of  an  information  system  that  requires 
the  integration  of  design  components  of  considerable 
scope  and  many  levels  of  detail.  The  use  of  this 
strategy  may  result  in  the  need  to  redesign  the  sys- 
tem     to      effect       integration.       This     strategy  is 
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appropriate  only  for  the  development  of  throw- away  or 
expendable  training  or  prototype  projects,  such  as  a 
prototype  system  used  to  verify  a  development  con- 
cept, or  an  experimental  system  used  to  train  person- 
nel in  other  systems  development  concepts  or  in  Data 
Dictionary  System  use. 

3.     Critical  Factors  First. 

In  this  strategy,  a  large  number  of  LIMs  are 
developed,  including  details  for  the  critical  aspects 
of  the  target  system  (e.g.,  critical  functional  re- 
quirements, critical  performance  characteristics, 
proof  of  concept,  etc.).  The  LIMs  will  be  consoli- 
dated into  a  GIM  with  broad  scope  but  uneven  detail. 
One  or  more  CSs  will  be  developed  with  the  same  broad 
scope  but  uneven  levels  of  detail.  The  process  will 
be  repeated  with  increasing  levels  of  detail  for  each 
LIM,  with  subsystems  analyzed  in  order  of  priority, 
until  the  data  element  level  is  reached.  The  criti- 
cal subsystems  will  be  processed  through  the  LDD  cy- 
cle first,  and  the  non-critical  subsystems  will  fol- 
low later . 

Impact;  This  strategy  is  appropriate  for  the  develop- 
ment of  a  very  large  system  if  the  critical  factors 
of  the  target  system  can  be  identified  and  accepted. 
It  is  also  appropriate  for  prototype  development  and 
for  evolutionary  development,  where  some  functions 
will  be  implemented  first  and  other  functions  will 
follow. 


2.2.4  Summary  of  LDD  Features. 

The  four  phases  of  LDD  use  a  variety  of  symbologies  to 
assist  in  analysis.  These  include  the  use  of  bubble  di- 
agrams in  the  analysis  of  data  flows,  Entity-Relationship- 
Attribute  (E-R-A)  diagrams  in  CS  development,  normalization 
analyses  where  applicable,  and  Data  Dictionary  System  (DDS) 
contents  and  automated  analysis  reports  throughout  LDD. 

The  outputs  of  LDD's  phases  are:  Local  Information-flow 
Models  (LIMs)  and  Global  Information-flow  Models  (GIMs)  that 
model  data  flows  for  the  organizational  components,  func- 
tions, and  events;  Conceptual  Schemas  (CSs)  that  provide  an 
E-R-A  model,  or  another  type  of  data  model,  for  use  by  pro- 
grammers and  designers;  and  External  Schemas  (ESs)  that 
present  an  application-oriented  user  view  for  use  within  the 
organization  as  a  representation  of  the  data  to  be  included 
in  the  target  system. 


-25- 


3.      PROJECT  ORGANIZATION 


For  LDD  to  be  performed  successfully,  plans  should  be 
made  to  support  the  information  requirements  of  LDD  and  to 
incorporate  LDD  roles  into  the  organization.  In  this 
chapter,  LDD  functional  roles,  training,  and  project  plan- 
ning needs  are  described. 


3.1     Functional  Roles  Needed  for  LDD 


The  following  functional  roles  are  described  in  terms 
of  the  development  of  LDD.  A  role  may  be  performed  by  many 
people,  or  one  person  may  perform  several  roles,  depending 
on  the  complexity  of  the  database.  Some  LDD  roles  may  over- 
lap with  roles  to  be  performed  in  Requirements  Analysis  and 
other  phases.     The  roles  required  for  LDD  are  the  following: 


o  Application  Administrators  (AAs)  who  will  work  with 
designers  and  analysts  to  define  and  validate  the 
data  and  functions.  One  or  more  AAs  may  be  needed 
according  to  the  size  of  the  system  and  the  complexi- 
ty of  the  application  areas.  AAs  will  work  with  a 
number  of  application  specialists, 

o  Application  Specialists  who  are  knowledgeable  about 
the  application  data  being  modeled,  or  about  the  ap- 
plication functions  that  use  the  data,  or  about  both. 
The  application  specialists  will  assist  the  designers 
and  analysts  in  preparing  an  accurate  LDD. 

o  Data  Administrator  (DA)  who  will  facilitate  the  LDD 
and  systems  development  process  by  ensuring  con- 
sistency in  data  definition,  and  overseeing  the  data 
management,  data  integrity,  and  data  security  func- 
tions performed  in  LDD  development.  The  DA  will  con- 
tinue to  perform  this  role  in  regulating  these  facets 
of  the  information  system  once  it  is  completed,  and 
so  will  also  use  the  LDD  once  it  is  developed.  The 
DA  may  have  a  sizable  staff,  depending  on  the  com- 
plexity of  the  data  resource  and  the  time  available 
to  perform  LDD  and  other  tasks.  The  DA  staff  may  in- 
clude the  Database  Administrator  and  the  Data  Dic- 
tionary Administrator.  The  DA  staff  will  work  close- 
ly with  the  AAs. 
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Database  Administrator  (DBA)  who  will  control  the  da- 
tabase and  the  DBMS,  facilitate  the  LDD  and  systems 
development  process,  assist  in  data  maintenance,  and 
use  the  LDD  as  it  is  developed.  The  DBA  is  concerned 
primarily  with  technical  aspects  of  the  database,  in 
contrast  to  the  DA,  who  is  more  concerned  with  infor- 
mation pxDlicy  and  interacts  with  management  and 
users.  The  DBA  will  continue  in  this  role  once  the 
information  system  is  operational.  The  DBA  may  have 
a  small  staff  to  support  this  function.  This  func- 
tion will  continue  throughout  the  life  cycle  of  the 
target  system. 

Data  Dictionary  Administrator  (DDA)  who  will  oversee 
the  operation  of  the  Data  Dictionary  System  (DDS) , 
and  assist  in  the  data  maintenance  process  for  LDD. 
The  DDA  may  be  supported  by  a  staff,  including  a  Li- 
brarian and  possibly  data  entry  personnel.  Data  en- 
try may  also  be  performed  directly  by  designers  and 
analysts  in  the  course  of  their  work.  The  DDA  func- 
tion should  continue  throughout  the  life  cycle  of  the 
target  system,  to  continue  to  maintain  documentation 
about  the  system. 

Data  Dictionary  Librarian  who  will  maintain  the  data 
in  the  data  dictionary  (DD) ,  and  support  the  LDD  and 
systems  development  effort. 

Database  Designers/Analysts  who  will  develop  the  in- 
formation requirements,  logical  database  diagrams, 
models  and  schemas.  They  will  be  expert  in  database 
design,  familiar  with  the  DDS,  and  become  familiar 
with  the  application  areas.  They  will  perform  the 
functions  that  are  the  focus  of  this  report.  Data- 
base designers  will  be  needed  throughout  the  life  cy- 
cle of  the  information  system,  to  maintain  high  per- 
formance and  efficiency  as  the  database  changes 
through  time. 

Project  Managers  who  will  direct  the  LDD  and  systems 
development  projects.  They  will  be  familiar  with  the 
application  areas,  computer  systems,  systems  develop- 
ment practices,  and  become  familiar  with  LDD  pro- 
cedures . 

End-users  of  the  DDS  and  the  information  system  under 
development  who  will  access  and  update  information  in 
the  databases,  and  who  will  generate  reports  and  de- 
cisions from  this  information.  End-users  will  in- 
clude personnel  from  all  organizational  levels  and 
will  perform  the  following  roles: 
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Data  Entry  and  Update 
Data  Retrieval 

-  Data  Analysis 

Data  Management  and  Control 

-  Project  Management 

-  Upper  Management 


3.2     Training  Required  for  LDD 

The  personnel  involved  in  the  LDD  phase  of  development, 
particularly  AAs  and  Application  Specialists,  will  require 
training  so  that  they  will  be  able  to  work  with  database 
designers  as  a  team.  Some  personnel  will  already  be 
knowledgeable  in  these  areas,  but  many  will  need  to  be 
trained.  Project  management  should  arrange  to  have  LDD  per- 
sonnel trained  in: 


o    The  purpose  and  general  procedures  of  LDD. 

o  The  points  of  view  to  be  represented  within  the  sys- 
tem (i.e.,  organizational  components,  functions,  and 
events)  . 

o  Use  of  the  symbology,  such  as  how  to  construct  and 
interpret  E-R-A  and  bubble  diagrams. 

o  Use  of  the  Data  Dictionary  System  or  other  automated 
tool . 


End-users  who  review  the  LDD  may  require  any  of  three 
levels  of  training  in  the  use  of  the  Data  Dictionary  System, 
depending  on  the  extent  of  each  end-user's  responsibility: 


o  Reading  knowledge  of  LDD  reports  that  are  generated 
via  the  DDS ,  to  be  able  to  recognize  when  the  report 
indicates  a  modeling  error. 
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o  Interpretive  capability  to  understand  LDD  reports 
generated  via  the  DDS ,  to  be  able  to  recognize  what 
is  wrong  in  a  report  that  indicates  a  modeling  error. 

o  Expert  knowledge  of  the  DDS  procedures  and  an  under- 
standing of  the  products  of  LDD,  to  be  able  to 
correct  errors  in  modeling  detected  in  DDS  reports. 


3.3     Project  Planning  and  Management  Requirements 


The  systems  development  Project  Manager  and  the  LDD 
Manager  should  plan  for  and  control  the  systems  development 
project  so  that  a  high  quality  LDD  results.  In  addition  to 
the  activities  of  traditional  management  roles,  managers  in 
these  positions  must  determine  that  several  procedures  have 
been  adopted  before  the  project  begins. 

The  Project  Manager  must  be  sure  that  good  methodolo- 
gies have  been  selected  or  developed  for  the  Needs  Analysis, 
Requirements  Analysis,  LDD,  and  other  phases.  In  addition, 
it  is  necessary  to  determine  that  these  methodologies  are 
coordinated  according  to  a  schedule  so  that  the  results  of 
previous  and  parallel  phases  are  available  for  use  by  other 
phases.  The  schedule  should  also  include  various  types  of 
training  for  personnel  working  on  parallel  phases.  Further, 
the  Project  Manager  must  decide  on  a  strategy  for  LDD 
development  that  will  support  the  breadth  of  scope  and  depth 
of  detail  to  be  encountered  in  analyzing  the  target  system. 

The  Logical  Database  Design  Manager  will  fill  a  similar 
role  for  the  LDD  phase.  The  LDD  Manager  will:  (1)  select  a 
good  LDD  methodology  and  analysis  strategy  suitable  to  the 
type  of  system  under  development;  (2)  coordinate  LDD  train- 
ing with  the  managers  for  parallel  phases;  (3)  coordinate 
LDD  activities  with  the  Requirements  Analysis  Manager,  so 
that  information  will  be  available  for  LDD  to  conform  to  ap- 
propriate schedules;  (4)  define  checkpoints  to  review  the 
progress  of  the  LDD  work;  (5)  determine  the  types  and 
characteristics  of  the  DDS  documentation  and  analysis  re- 
ports to  be  generated  to  support  the  LDD  phases;  and  (6) 
manage  the  synthesis  and  integration  of  information  from 
many  sources  within  the  organization  to  support  LDD. 
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4.      LOCAL  IN FORMAT ION -FLOW  MODELING 


A  Local  Information-flow  Model  (LIM)  is  a  description 
of  the  movement  of  data  collections  such  as  reports,  forms, 
memos,  messages,  transactions,  and  files  to,  from,  and 
within  a  particular  focal  point.  The  focal  point  may  be  an 
organizational  component  (e.g.,  the  personnel  department),  a 
function  or  application  (e.g.,  payroll  processing),  or  an 
event  (e.g.,  a  milestone  in  the  budget  cycle).  The  first 
iteration  of  this  phase  will  produce  a  single  LIM  summariz- 
ing the  inputs  and  outputs  of  the  entire  organization  served 
by  the  database  being  designed.  During  subsequent  itera- 
tions multiple  LIMs  will  be  produced,  each  describing  a  part 
of  the  next  higher-level  LIM.  The  level  of  detail  may  be 
very  high  (e.g.,  very  general  types  of  data  going  into  or 
out  of  an  entire  organization),  intermediate  (e.g.,  reports 
and  other  data  going  into,  out  of,  or  processed  within  an 
office),  or  very  low  (e.g.,  transformation  of  an  employee 
number  into  an  employee  name) ,  depending  on  the  number  of 
iterations  through  the  four  phases  of  logical  database 
design. 

There  are  two  reasons  for  choosing  this  approach: 


1.  Complexity  is  controlled  at  every  stage  of  the  itera- 
tion by  restricting  the  scope  of  each  LIM.  Inter- 
views with  users  can  concentrate  on  the  most  critical 
aspects  of  the  user^s  organization,  function,  or 
event,  with  the  assurance  that  a  higher-level  context 
has  already  been  developed  and  that  details  can  be 
filled  in  later.  The  interviewer  need  not  be 
overwhelmed  with  trying  to  understand  everything  all 
at  once.  Note  that  a  top-down  approach  is 
advisable — starting  from  data  elements  and  working  up 
is  more  likely  to  end  in  a  disastrous  lack  of  direc- 
tion and  an  abundance  of  confusion. 

2.  The  different  aspects — organization,  function,  and 
event — represent  the  fact  that  organizational  struc- 
tures are  important,  but  they  do  not  give  a  complete 
model  of  information  processing.  Functions  and 
responsibilities  are  shared  by  sequential  or  simul- 
taneous access  to  and  transformation  of  data.  All 
aspects  may  be  required  to  give  a  true  picture  of  da- 
tabase requirements.  Note  that  manual  functions 
should  be  analyzed  if  there  is  a  significant  chance 
that  they  will  be  automated  during  the  life  of  the 
d  at abas e . 
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The  general  objective  is  for  a  LIM  to  represent  whatev- 
er an  application  specialist  knows  about  his  or  her  job  and 
organization.  The  LIM  does  not  represent  details  about  how 
information  is  captured  or  derived  before  it  reaches  the  ap- 
plication specialist  or  how  it  is  used  or  processed  after  it 
leaves  her  or  him. 

The  emphasis  of  the  LIM  should  be  on  business  functions 
and  events — that  is,  data,  operations,  and  products  that  are 
basic  to  achieving  organizational  objectives — rather  than  on 
any  particular  technology  for  implementing  those  functions. 
One  reason  for  this  particular  emphasis  is  the  fact  that 
technology  changes  much  more  rapidly  than  the  business  func- 
tions (the  need  for  payroll  is  constant,  but  the  policies 
and  technologies  implementing  it  are  changeable).  A  data- 
base should  be  relatively  stable  and  retain  its  value  over  a 
long  period  of  time — the  time  and  cost  of  data  collection 
and  organization  are  too  great  to  permit  the  database  to  be 
considered  anything  less  than  a  major  capital  investment. 
Another  reason  for  the  emphasis  on  business  functions  is 
that  these  are  familiar  and  well-understood  by  the  data 
users,  who  are  the  people  responsible  for  achieving  organi- 
zational objectives.  The  abstract  concepts  of  data  model- 
ing, introduced  in  the  phase  concerned  with  the  development 
of  the  Conceptual  Schema,  are  generally  not  meaningful  to 
the  user  unless  there  is  some  familiar  context  of  business 
functions.  One  way  of  viewing  the  LIM  is  that  it  is  a  means 
for  relating  the  abstract  External  Schema  (a  part  of  the 
Conceptual  Schema)   to  a  concrete  business  context. 


4.1     Information  Used  to  Develop  the  LIM 

Information  that  is  relevant  to  the  development  of  the 
LIM  may  be  obtained  through  examination  of  documents  or 
through  interviews,  or,  preferably,  through  interviews  based 
on  thorough  preparation  via  documents.  The  following  infor- 
mation is  generally  needed: 


1.  The  nature,  objectives,  structure,  and  scope  of  the 
subsystem  must  all  be  analyzed  to  ensure  compatible 
LIMs.  Both  the  present  and  the  future  should  be  con- 
sidered. Non-routine  operations,  or  operations  that 
are  performed  infrequently,  may  be  particularly 
important — for  example,  end-of-year  accounting  opera- 
tions may  have  unique  but  critical  requirements.  In- 
teractions   with    customers,  vendors,  and  other  parts 
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of  the  external  environment  may  be  very  important. 

2.  Existing  automated  systems  and  other  available 
hardware,  software,  and  data  resources  should  be  stu- 
died to  determine  how  they  interact  with  the  subsys- 
tem being  studied;  the  emphasis  should  be  on  the 
queries,  reports,  and  transactions  that  are  actually 
relevant  rather  than  on  what  is  currently  produced. 
It  is  important  to  maintain  continuity  with  the 
present  while  still  ensuring  sufficient  flexibility 
for  long  term  growth  of  the  information  resource. 
Existing  systems  may  already  have  replaced  certain 
functions  and  as  such  should  themselves  be  "inter- 
viewed." This  can  be  difficult  since  existing  systems 
may  be  poorly  structured  and  documented.  However, 
existing  systems  have  already  solved  problems  —  what 
are  those  problems?  Existing  systems  may  be  enforc- 
ing   policies     that  the  people  are  no  longer  aware  of 

—  what  are  those  policies?  Existing  systems  may 
also  be  creating  data  that  everyone  takes  for  granted 

—  how  are  existing  systems  combining  files,  applying 
algorithms,  etc.? 

3.  The  subsystem's  perspective  on  decisions  must  be 
analyzed.  The  position  titles  and  descriptions  held 
by  decision-makers,  the  business  models  that  they 
use,  the  information  that  they  require,  and  the  rela- 
tionships that  they  have  with  other  decision-makers 
must  all  be  analyzed.  Senior  management  views  (stra- 
tegic planning) ,  middle  management  views  (control  and 
tactical  policy),  and  applications  views  (operations) 
are  all  required  to  give  balance  to  the  total  collec- 
tion of  LIMs.  Historical  and  "what  if"  data  are  par- 
ticularly important  in  analyzing  the  data  flow  of 
higher-level  decision  makers. 

4.  Real-world  rules  and  policies  should  be  studied. 
Geographic  location  requirements  are  particularly  im- 
portant (e.g.,  there  is  little  point  in  designing  a 
highly  integrated  central  database  if  the  policy  is 
to  maintain  local  control  of  data) .  Policies  on  data 
retention  and  archiving  may  also  be  important  (e.g., 
archiving  may  constitute  a  major  information  subsys- 
tem) ,  Security,  privacy,  integrity,  and  error  han- 
dling policies  (including  policies  and  procedures  for 
recovery  from  both  data  processing  and  organizational 
mistakes)  may  have  major  effects  on  the  data  struc- 
tures (for  example,  classified  and  unclassified  data 
may  have  to  be  stored  separately) . 
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5.  A  catalog  of  reports  and  forms  needed  for  routine 
tasks  is  clearly  relevant  to  the  LIM.  Collections  of 
reports  and  forms  are  relevant  to  high-level  LIMs, 
individual  reports  and  forms  are  relevant  to 
intermediate-level  LIMs,  and  parts  of  reports  and 
forms  are  relevant  to  low-level  LIMs.  The  timeliness 
and  quality  of  the  reports  and  forms  should  be 
recorded.  Reports  that  have  outlived  their  useful- 
ness are  irrelevant  to  LDD. 

6.  Collections  of  informal  data  are  also  very  important. 
This  data  can  include  files  or  folders  of  memos  and 
letters  (e.g..  Freedom  of  Information  Act  requests, 
and  customer  complaints  in  writing),  notes  on  tele- 
phone conversations  (e.g.,  payroll  inquiries),  and 
databases  on  personal  computers. 

7.  Formal  reference  data  collections  such  as  FIPS  codes, 
ZIP  codes,  pay  scale  tables,  and  address  or  telephone 
directories  are  relevant. 

8.  "Log"  books  or  lists  may  be  used  to  assign  unique 
numbers,  organize  office  functions,  record  signifi- 
cant events,  or  otherwise  coordinate  activities. 

9.  Other  regular  sources  of  information,  such  as  tele- 
phone contacts,  should  be  carefully  studied,  since 
these  may  be  very  relevant  to  getting  the  job  done. 


10.  Information  from  the  higher-level  GIM  and  the 
higher-level  LIM  which  is  being  subdivided  provide 
context  for  developing  more  detailed  LIMs  in  succes- 
sive iterations  of  the  LDD  cycle.  Once  LDD  has  be- 
gun, the  examination  of  this  information  will  be  the 
first  step  in  providing  a  LIM. 

11.  Quantitative  information  on  volume  of  data  and  fre- 
quency of  processing  for  all  of  the  above.  This  in- 
formation will  be  used  to  help  develop  an  estimate  of 
the  database  workload. 


Since  each  LIM  is  a  refinement  of  the  previous  itera- 
tion of  the  design  cycle,  the  LIM  is  constrained  by  the  pre- 
vious higher-level  LIM  and  External  Schema.  If  deeper 
analysis  uncovers  an  error  at  the  higher  level,  then  that 
higher-level  should  be  corrected  before  proceeding  further. 
Otherwise,  other  lower-level  LIMs,  based  on  the  erroneous 
LIM  and  External  Schema,  may  contain  errors  or  be  incon- 
sistent with  each  other. 
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4.2     Functions  of   the  LIM 


The  primary  function  of  the  LIM  is  to  serve  as  part  of 
the  Global  I nf ormation- f low  Model  (GIM) .  Other  functions  of 
the  LIM  are: 


1.  The  LIM  provides  a  guide  for  the  development  of 
further  details.  Each  iteration  is  based  on  a  decom- 
position of  a  previously  developed  LIM,  unless  the 
focus  is  switched  from  an  organizational  component  to 
a  function  or  event,  in  which  case  the  new  LIMs  are 
based  on  combinations  of  previously  developed  LIMs. 

2.  The  LIM  may  be  used  as  a  guide  to  planning  the 
development  of  a  new  application  program  or  system, 
modifying  an  old  application  program  or  system,  or 
modifying  the  organizational  structure.  In  each 
case,  the  LIM  is  analyzed  to  see  whether  the  flow  of 
data  is  efficient  and  effective;  changes  are  suggest- 
ed if  unused  reports  are  being  produced,  if  similar 
functions  are  being  performed  unnecessarily,  if  func- 
tions that  should  be  performed  by  a  computer  system 
are  being  performed  manually,  or  if  the  data  flow  can 
be  reduced  by  combining  organizational  components 
that  sequentially  process  the  same  data. 

3.  The  LIM  is  also  used  to  collect  information  concern- 
ing the  database  workload.  This  information  is  even- 
tually used  to  optimize  and  evaluate  the  physical  da- 
tabase design. 


4.3     Procedure  for  Developing  the  LIM 


Figure  3  shows  the  five  sequential  steps  in  the 
development  of  the  LIM.  The  steps  are  described  in  the  fol- 
lowing paragraphs. 
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LOCAL  INFOilMATION-FLOW  MODELING  [LIM)  PROCEDURE 


STEP  4. 3. 1 


STEP  4.3.2 


STEP  4.3.3 


STEP  4.3.4 


STEP  4,3.5 


REVIEW  NEEDS 


I 


I 


PLAN  DEVELOPMENT  OF  THE  LIMs 


i 


DEVELOP  LIMs 


I 


DEVELOP  WORKLOAD  WITH 

RESPECT  TO  LIMs 


FIGURE  3 
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4.3.1  Review  Need  for  Analysis. 


The  primary  function  of  this  step  is  to  determine 
whether  the  organizational  component,  function,  or  event 
under  consideration  should  be  subdivided  for  further 
analysis,  or  whether  it  has  already  been  analyzed  suffi- 
ciently . 

The  first  iteration  of  the  logical  database  design 
methodology  will  begin  with  a  preliminary  determination  of 
boundaries — that  is,  which  organizational  components,  func- 
tions, and  events  require  interaction  with  the  proposed  da- 
tabase. Next,  it  is  necessary  to  determine  the  best  method 
for  subdividing  the  design  problem--by  organizational  com- 
ponents, by  functions,  or  by  events.  Generally,  the  first 
few  subdivisions  will  be  along  organizational  boundaries. 
These  boundaries  are  usually  well-defined,  familiar,  and 
non-threatening  to  the  application  specialists.  They  serve 
very  well  in  identifying  broad  classes  of  data,  major  func- 
tions and  events,  and  data. flows. 

Organizational  decomposition  may  be  insufficient,  how- 
ever, for  the  detailed  development  of  data  structures  which 
are  shared  among  different  organizational  components.  Later 
iterations  should  concentrate  on  subdividing  the  functions 
and  events  that  have  been  identified  during  the  study  of  or- 
ganizational subdivisions;  such  functions  and  events  must 
provide  data  to  the  database  and  use  data  from  it,  so  are 
directly  relevant  to  the  structure  of  the  database. 

Since  functions  and  events  frequently  cross  organiza- 
tional boundaries,  their  analysis  may  suggest  the  need  for 
reorganization  to  eliminate  duplicate  or  unnecessary  jobs, 
and  will  almost  certainly  require  cooperation  among  applica- 
tion specialists  from  different  organizational  components. 
Consequently,  such  analysis  is  very  delicate  and  should  not 
be  attempted  too  early  in  the  LDD  process. 

Eventually  it  will  be  determined  that  there  is  no  need 
to  subdivide  any  more  functions  or  events;  the  logical  data- 
base design  process  is  then  "complete,"  although  maintenance 
of  the  LIMs  and  other  products  must  continue  indefinitely. 
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step  £.2«2i    Review  Need  for  Analysis 

Function:  To  determine  whether  more  detail  is 

re qui  red 

Output:  Determination  of  whether  to  subdivide  a 

subsystem 

Team  Members:  User  -  AA,  DA 

Developer  -  AA,  DA 

Tools:  Use  DD  to   report  on  previous  work 

Guidelines:      Decision  involves  both  technical  and 
management  issues 


4.3.2  Determine  Subsystems. 

Once  a  decision  has  been  made  to  subdivide  an  organiza- 
tional component,  function,  or  event,  the  next  step  is  to 
determine  the  appropriate  subdivisions.  Two  situations  may 
be  distinguished: 


1.  The  subdivision  involves  a  further  refinement  of  an 
organizational  component,  function,  or  event.  This 
is  the  normal  case  in  business  systems  analysis,  so 
various  methodologies  from  business  systems  planning, 
organizational  analysis,  and  software  engineering  may 
be  applied.  Either  function-oriented  methodologies 
[DEMA78,  GANE79,  MYER78,  ROSS77]  or  data-oriented 
methodologies  [JACK83,  ORRK82]  may  be  used  as  meas- 
ures of  the  relative  merit  of  different  decomposi- 
tions . 


2.  The  subdivision  involves  a  switch  from  one  type  of 
analysis  to  another.  For  example,  the  previous 
iteration  of  subdivision  was  based  on  organizational 
components,  but  this  iteration  is  to  be  based  on 
functions.  In  this  case,  the  primary  activity  is 
composition,  rather  than  decomposition — the  various 
aspects  of  a  function  that  appear  in  different  organ- 
izational components  must  first  be  joined  together  to 
form  a  coherent  statement  of  the  whole  function,  and 
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then  functional  decomposition  can  proceed  at  later 
iterations.  Clearly,  it  is  extremely  important  that 
data  flow  has  been  carefully  documented  during  previ- 
ous iterations;  data  flow  is  the  primary  clue  to  the 
common  basis  for  different  organizational  perspec- 
tives on  a  single  function.  The  effect  of  a  Data  Dic- 
tionary System  is  to  allow  the  DA  to  combine  an  or- 
ganizational hierarchy,  a  functional  hierarchy,  and 
an  event  hierarchy  into  a  consistent  network  which 
can  be  supported  by  the  database  structure. 


In  either  case,  the  result  will  be  a  list  of  well- 
defined  subsystems — organizational  components,  functions,  or 
events —  of  the  LIM  being  analyzed.  The  subsequent  steps 
will  determine  how  each  subsystem  interacts  with  the  data 
flowing  into  or  out  of  that  LIM,  and  the  data  flowing  from 
or  to  the  other  subsystems. 


Step  4^.2*2     Determine  Subsystems 


Function: 


Determination  of  how  to  subdivide  a 
subsystem 


Output: 


List  of  lower- level  subsystems 


Team  Members: 


User  -  AA,  DA 
Developer  -  AA,  DA 


Symbology: 


Organization  charts,  data-flow  or 
event  diagrams 


Tools: 


Use  DD  to  represent  organizational 
components,  functions,  or  events 


Guidelines: 


Care  is  required  —  poorly  chosen 
subsystems  will  have  overly  complex 
interfaces 
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4.3.3  Plan  Developinent  of  the  LIM. 


This  step  involves  the  development  of  a  detailed  plan 
for  this  iteration  of  the  analysis.  The  plan  may  include 
priorities,  so  that  decomposition  will  consider  critical 
factors  first.     Two  strategies  are  possible: 


1.  Each  step  in  the  subdivision  spawns  a  set  of  indepen- 
dent plans.  Detailed  work  may  proceed  in  parallel, 
given  a  sufficiently  large  staff,  with  the  results 
coordinated  primarily  through  the  data  dictionary. 
The  advantage  of  this  approach  is  that  planning  is 
minimized.  The  disadvantage  is  that  quality  control 
of  the  data  dictionary  becomes  extremely  critical 
during  and  after  execution  of  the  plan.  Synonyms  and 
homonyms  for  functions  and  data  must  be  detected  and 
resolved  quickly  or  different  analysis  paths  will 
unknowingly  overlap,  resulting  in  confusion  and  du- 
plication of  effort.  The  philosophy  of  this  strategy 
is  to  move  quickly  and  solve  problems  later  (possibly 
during  the  development  of  the  GIM) . 

2.  Each  step  in  the  subdivision  involves  the  development 
of  a  single,  coordinated  plan.  Detailed  work  is 
coordinated  in  advance,  so  that  problems  of  synonyms, 
homonyms,  and  duplicated  effort  are  minimized.  The 
advantage  of  this  approach  is  that  overall  control  of 
the  effort  is  maintained.  The  obvious  disadvantage 
is  that  this  approach  requires  extremely  knowledge- 
able DA  and  AA  staff  to  formulate,  monitor,  and  con- 
trol the  execution  of  the  plan.  Also,  more  work  must 
be  done  serially  rather  than  in  parallel. 


In  either  case,  it  is  necessary  to  develop  a  detailed 
project  management  plan,  with  milestones,  time  and  cost  es- 
timates, and  assignments  for  application  specialists  as  well 
as  for  AA  and  DA  personnel. 
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step  4.3.3     Plan  Development  of  the  LIM 


Function: 


Develop  project  management  plan  for 
this  subsystem 


Output : 


Milestones,  time  and  cost  estimates 


Team  Members: 


User  -  AA,  DA 

Developer  -  AA,  DA,  Managers 


Symbology: 


Project  management  charts 


Tools: 


Use  DD  to  represent  project  management 
data  and  boundaries 


Guidelines:       Assignments  must  be  very  specific 


4.3.4  Develop  LIM. 

Various  system  analysis  and  design  methodologies  may  be 
used  in  conjunction  with  a  data  dictionary  to  document  the 
data  flows  that  are  developed.  Either  function-oriented 
methodologies  [DEMA78,  GANE79,  MYER78,  ROSS77 ]  or  data- 
oriented  methodologies  [JACK83,  ORRK82]  are  suitable. 
Whereas  previous  steps  have  involved  consultation  with 
management,  this  step  and  the  following  are  best  accom- 
plished by  short  interviews  (no  more  than  two  hours  per 
iteration)  with  application  specialists.  Reference  material 
and  the  LIM  developed  during  the  previous  iteration  are  used 
to  prepare  for  the  interview  and  to  verify  the  analyst's  in- 
terpretation of  the  application  specialist's  statements. 
All  materials  may  be  made  available  to  the  application  spe- 
cialists in  advance  of  the  interview.  (Note  that  discrepan- 
cies revealed  during  an  interview  should  prompt  further 
questions  rather  than  challenges — the  interview  should  not 
be  threatening.)  Graphical  simplicity  is  very  desirable,  so 
that  untrained  users  can  judge  the  correctness  of  the  LIMs 
that  are  relevant  to  them. 

Useful  types  of  diagrams  include  the  following: 
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1.  An  organization  chart  can  be  used  to  show  the 
hierarchical  relationships  among  organizational  LIMs. 

2.  A  "bubble"  diagram  with  an  organizational  focal  point 
connected  to  other  organizations  by  data  flows  can  be 
used  to  represent  an  organizational  LIM,  as  in  the 
following: 


EXAMPLE  OF  A  LOCAL  IN FORMAT ION -FLOW  MODEL 
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Figure  4 
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3.  A  functional  hierarchy  can  be  used  to  show  the 
hierarchical  relationships  among  the  functional  LIMs. 

4.  A  data-flow  diagram  [DEMA78,  GANE79,  MYER78]  or  ac- 
tion diagram  [ROSS77]  can  be  used  to  show  inputs, 
outputs,  subf unctions ,  and  data  flows  among  the  sub- 
functions  of  a  functional  LIM.  (Note  that  this  type 
of  diagram  shows  two  levels  of  the  LIM  hierarchy.) 

5.  A  Gantt  chart  can  be  used  to  show  the  temporal  rela- 
tionships among  events. 

6.  A  PERT  chart  can  be  used  to  show  the  relationships, 
especially  time  dependencies,  among  functions  and 
events. 

7.  A  state-vector  diagram  [JACK83]  or  a  decision  table 
can  be  used  to  show  additional  details  of  functions 
and  events. 


The  data  dictionary  is  used  to  record  detailed  informa- 
tion that  would  only  confuse  a  diagram;  automated  analysis 
of  program  code,  job  control  language,  and  audit  trails  may 
provide  much  of  the  detail.  The  selectivity  of  data  dic- 
tionary queries  and  reports  helps  to  make  the  details 
comprehensible.  Diagrams  should  be  produced  automatically 
from  the  data  dictionary.  Also,  graphic  input  could  be  a 
means  of  populating  the  data  dictionary  when  this  capability 
becomes  automated  in  the  future. 

A  special  but  important  example  of  data  flow  is  storage 
and  retrieval  of  information  by  an  organizational  component, 
function,  or  event;  the  storage  medium  is  treated  like 
another  organizational  component,  function,  or  event. 

Data  flow  is  used  to  determine  the  formal  consistency 
and  completeness  of  the  analysis — for  example,  whether  each 
data  flow  has  a  source  and  a  sink  (either  may  be  some  inter- 
nal storage  medium) .  The  use  of  a  data  dictionary  is  ex- 
tremely important  in  this  situation  to  ensure  that  all  of 
the  various  aspects  of  the  function  are  considered.  The 
views  of  all  users  who  interact  with  a  function  must  be  re- 
flected in  that  function. 

The  description  of  data  flows  should  generally  include 
one  level  of  decomposition.  For  example,  if  the  data  flows 
in  a  top-level  functional  analysis  are  collections  of  re- 
ports, then  each  data  description  in  the  data  dictionary 
should  include  a  list  of  the  component  reports.     At  a  lower 


-42- 


level,  if  the  data  flows  are  reports,  then  their  descrip- 
tions should  include  subdivisions  of  the  reports — selected 
columns,  or  rows  between  subtotals,  or  the  subtotals  them- 
selves, for  example.  At  a  very  detailed  level,  the  data 
descriptions  would  be  data  elements. 

Information  which  is  useful  in  understanding  the  rela- 
tive importanfce  of  the  functions  and  in  planning  the  next 
iteration  of  this  phase  includes  the  following: 

1.  Staff  time,   in  work-years  or  other     convenient  unit, 
expended  on  performing  the  function. 

2.  The  number  of  staff  personnel     performing     the  func- 
tion. 

3.  The  number  of  locations  where  the     function     is  per- 
formed . 

4.  Whether  there  is  a  single  step  that  consumes     80%  or 
more  of  the  time  spent  on  the  function. 


Step  4.3.4 

Develop  LIMs 

Function: 

Provide  guidance  to  the  development 
of  the  GIM  and  CS 

Output: 

LIMs 

Team  Members 

:   User  -  AA,  DA 

Developer  -  AA,  DBA 

Symbology: 

Use  bubbles  to  represent  organizational 
components,  events,  functions,  or 
external  interfaces.     Use  lines  to 
represent  data  flows. 

Tools: 

Use  DD  to   represent  subsystems 
and  interfaces 

Guidelines: 

Graphical  simplicity  is  desirable 
Use  selectivity  of  DD  reports 
Should  be  easy  for  users  to  understand 
and  critique 
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4.3.5  Develop  Workload  With  Respect  to  LIMs. 


The  primary  function  of  this  step  is  to  develop  a  prel- 
iminary description  of  the  workload:  the  frequency,  se- 
quence, and  selectivity  with  which  functions  use  or  produce 
data,  and  the  volume  of  stored  data  [JEFF82,  SUST84]  .  The 
workload  will  be  used  during  the  development  of  the  External 
Schemas  to  determine  whether  the  Conceptual  Schema  can  sup- 
port the  LIM,  and  what  paths  must  be  taken  through  the  Con- 
ceptual Schema  to  obtain  the  data  required  by  the  LIM.  It 
will  also  be  used  to  determine  whether  certain  functions 
should  be  automated.  The  workload  must  be  used  during  the 
development  of  the  Internal  Schema  (physical  database 
design)  to  determine  appropriate  physical  record  structures, 
record  placement  in  areas,  access  methods,  loading  factors, 
indexes,  and  other  parameters.  Accordingly,  this  step  must 
be  performed  during  the  most  detailed  iteration  of  function- 
al analysis;  it  may  be  performed  at  earlier  steps  to  provide 
additional  quality  control  for  the  LIMs  and  Conceptual  Sche- 
ma. 

At  this  phase,  the  workload  is  described  in  terms  of 
data  collections  that  may  be  very  different  from  the  logical 
records  that  will  eventually  constitute  the  final  Conceptual 
Schema.  In  particular,  the  level  at  this  phase  may  be  very 
high  (e.g.,  data  objects  like  "employee,"  "project,"  and 
"part"  rather  than  data  elements  like  " employee- fi rst-name , " 
"est imated-proj ect-cost , "  and  "part-quanti ty-in-warehouse" ) 
and  the  grouping  of  data  may  be  quite  arbitrary  (e.g.,  "em- 
ployee" may  include  data  about  skills,  projects,  and  organi- 
zations associated  with  the  employee) .  Eventually  these 
data  objects  will  be  restructured  to  form  a  database,  so  it 
is  important  to  be  able  to  map  this  preliminary  workload 
into  appropriate  paths  through  that  database. 

The  information  to  be  collected  and  stored  in  the  data 
dictionary  should     include  the  following: 


1.  The  volume  (number  of  instances)  of  each  data  collec- 
tion (e.g.,  the  number  of  employees,  projects,  and 
par  ts)  . 

2.  The  priority  of  the  function  (e.g.,  "an  airline 
reservation  must  be  confirmed  within  20  seconds"  and 
"a  marketing  analysis  on  advance  reservations  must  be 
available  within  2  hours  of  a  request"). 
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3.  The  frequency  of  execution  of  the  function. 

4.  The  sequence  with  which  data  collections  are  accessed 
by  the  function,  and  the  source  of  the  data  from  in- 
put or  database  (e.g.,  start  with  "employee,"  then 
access  "project,"  then  access  "project-manager"  to 
determine  who  "manages"  a  given  employee). 

5.  The  parts  of  each  data  collection  that  are  used  to 
decide  whether  a  given  instance  of  that  data  collec- 
tion is  relevant  (e.g.,  "employee-name"  identifies 
the  required  "employee"  data). 

6.  For  each  of  the  parts  of  data  collection,  the  number 
of  relevant  instances   (e.g.,  "1"). 

7.  For  each  relevant  data  collection  accessed  by  the 
function,  the  parts  that  are  needed  for  retrieval  by 
the  function  (e.g.,  "employee-project"  is  the  only 
retrieved  part  of  the  "employee"  data) .  If  applica- 
ble, the  preferred  order  is  desirable  (e.g.,  the 
"employee- project"  data  is  to  be  sorted  by  "project- 
number"  )  . 

8.  The  parts  of  each  relevant  data  collection  that  are 
needed  for  update  by  the  function  (e.g.,  "employee- 
hours"  is  the  only  updated  part  of  the  "employee" 
data) . 

9.  At  each  point  where  the  function  branches,  the  frac- 
tion of  the  time  each  branch  is  taken  (e.g.,  90%  of 
the  time  "employee-project"  will  be  non-null,  so 
"project"  will  be  accessed,  and  10%  of  the  time  it 
will  be  null  so  the  path  will  terminate). 
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step  £.5^.2    Develop  Workload  with  Respect  to  LIMs 

Function:  Develop  preliminary  specifications  for 

physical  design 

Output:  LIMs  with  volume,  frequency, 

sequence,  and  selectivity 

Team  Members:  User  -  AA,  DBA 

Developer  -  AA,  DBA  and  Analysts 

Symbology:         LIM  diagrams 

Tools:  Use  DD  to  store  workload  information 

to  be  used  for  physical  design 

Guidelines:       Keep  the  scope  limited  to  a  single 
appl icat ion 


-46- 


5.     GLOBAL   IN  FORMAT  ION -FLOW  MODELING 


A  Global  Inf or mat ion- flow  Model  (GIM)  is  basically  an 
interconnected  collection  of  all  of  the  Local  Information- 
flow  Models  (LIMs) .  Its  structure  is  quite  complex:  it  com- 
bines up  to  three  hierarchies  of  LIMs  (a  hierarchy  based  on 
organizational  components,  another  based  on  functions,  and 
possibly  another  based  on  events);  these  must  be  intercon- 
nected in  terms  of  data  flow,  which  itself  may  be  a  complex 
network  of  data  objects,  as  well  as  other  interrelationships 
such  as  organizational  authority  and  responsibility.  A  Data 
Dictionary  System  (DDS)  is  strongly  recommended  to  manage 
the  GIM.  In  an  extremely  complex  situation,  where  even  a 
DDS  is  unable  to  present  the  mass  of  information  in  a  mean- 
ingful way,  multiple  GIMs  may  be  developed,  each  represent- 
ing a  major  subsystem  loosely  connected  to  the  other  GIMs. 
Note,  in  particular,  that  the  GIM,  like  the  LIM,  must  gen- 
erally represent  both  automated  and  manual  data,  and  both 
current  and  planned  functions. 

The  major  task  involved  in  developing  the  GIM  is  simply 
adding  the  new  details  represented  by  each  new  LIM.  The  new 
LIMs  must  be  verified  for  consistency  with  higher-level 
LIMs,  names  must  be  reconciled  with  existing  names,  and  the 
different  perspectives  (organization,  function,  and  event) 
must  be  interrelated.  These  are  basically  responsibilities 
of  the  DA  with  assistance  from  the  AAs  in  detecting  and 
resolving  potential  problems  in  performance,  cost,  reliabil- 
ity, security,  and  the  like.  The  DA  should  not  require 
direct  access  to  the  users. 

The  GIM  may  be  represented  in  various  forms  according 
to  the  methodology  chosen.  A  diagram  may  consist  of  ovals 
or  rectangles  representing  the  subsystems,  and  labelled 
lines  representing  the  data  flows.  This  is  a  simple 
source-sink  model  which  is  very  useful  for  communicating 
with  users.  Other  representations  of  the  GIM  include  many 
different  types  of  matrices  showing  the  interactions  of  or- 
ganizational components,  functions,  events,  and  data  objects 
with  each  other  [MART82i.  A  data  dictionary  is  recommended 
for  the  primary  means  of  representation,  from  which  diagrams 
and  matrices  can  be  produced  selectively  and  automatically. 
Also,  the  data  dictionary  is  quite  suitable  for  representing 
details  that  would  be  very  confusing  in  a  diagram  or  matrix, 
such  as  the  Local  Information-flow  Models  (LIMs)  and  their 
relationships  with  the  GIM,  the  relationships  between  names 
in  the  GIM  and  in  the  LIMs,  and  details  of  database  work- 
load . 
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Some  methodologies  dispense  with  the  GIM  [NAVA8  2]  and 
begin  the  design  of  the  Conceptual  Schema  with  a  small 
number  of  applications,  then  add  more  applications,  continu- 
ally integrating  the  new  applications  with  the  old  Conceptu- 
al Schema.  This  has  the  advantage  of  facilitating  quick 
development  of  a  prototype,  but  has  the  disadvantage  of  pos- 
sible major  revisions  of  the  Conceptual  Schema  [JEFF82]. 
The  safer  procedure  is  to  develop  a  GIM  with  careful  control 
of  detail,  so  that  the  level  of  effort  is  reasonable  yet  the 
GIM  provides  sufficient  detail  to  guide  the  development  of  a 
relatively  stable  Conceptual  Schema.  This  procedure  is  also 
likely  to  uncover  important  new  interrelationships  among 
LIMs,  such  as  unexpected  interrelationships  among  organiza- 
tional components,  and  dependencies  within  them. 

Note  the  similarity  of  the  Local  Information-flow  Model 
and  Global  Information-flow  Model  development  to  Business 
Systems  planning  (BSP)  [MART82],  which  is  also  based  on  data 
flow.  The  primary  difference,  which  is  extremely  important, 
is  that  each  iteration  of  the  Local  Information-flow  Model 
and  Global  Information-flow  Model  is  followed  by  the 
development  of  the  Conceptual  Schema  and  External  Schemas  in 
the  procedures  described  in  this  paper.  This  cyclical  and 
iterative  approach  balances  the  data  flow  perspective  with 
the  data  structure  perspective,  so  that  neither  will  be  em- 
phasized at  the  expense  of  the  other.  BSP,  however,  em- 
phasizes the  data  flow  perspective  almost  to  the  exclusion 
of  the  data  structure  perspective;  high  level  data  objects 
are  identified,  but  their  relationships  and  detailed  struc- 
tures must  be  developed  by  another  methodology. 


5.1     Information  Used  to  Develop  the  GIM 


Information  that  is  relevant  to  the  development  of  the 
GIM  is  obtained  primarily  from  the  previous  iteration  of  the 
GIM  and  the  newly  developed  LIMs.  Other  types  of  informa- 
tion are  similar  to  those  used  to  develop  the  LIM,  except 
that  they  are  at  a  higher  organizational  level. 


1.  The  nature,  objectives,  and  scope  of  the  organization 
must  be  analyzed  to  ensure  a  compatible  GIM. 

2.  The  organizational  perspective  on  decisions     must  be 
d  eterm  ined . 
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3. 
4. 
5. 


Organizational  rules  and  policies  must  be 
Reports  and  forms  must  be  examined. 
Available  resources  must  be  determined. 


a  nalyzed . 


5.2     Functions  of  the  GIM 


The  primary  function  of  the  GIM  is  to  guide  the 
development  of  the  Conceptual  Schema.  Other  functions  of 
the  GIM  are: 


1.  The  GIM  provides  context  for  the  development  of  the 
next  iteration  of  the  LIMs. 

2.  The  GIM,  like  the  LIMs,  may  assist  in  management 
planning  to  increase  efficiency;  the  GIM  provides  a 
wider  perspective  on  reducing  data  flow  through 
changes  in  functions  and  organizational  structures. 

3.  The  GIM  may  also  be  used  to  design  the  interfaces 
among  separate,  loosely  connected  Conceptual  Schemas, 
as  may  be  appropriate  among  several  large  systems  or 
a  distributed  database  system. 


5.3     Procedure  for  Developing  the  GIM 

Figure  5  shows  the  four  sequential  steps  in  the 
development  of  the  GIM.  The  steps  are  described  in  the  fol- 
lowing paragraphs. 
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GLOBAL  INFORMATION-FLOW  MODELING  [GIM)  PROCEDURE 


STEP  5.3.  1 


STEP  5.3.2 


STEP  5.3.3 


STEP  5.3.4 


VERIFY  THE 

LIMs 

CONSOLIDATE 

Lins 

I 


REFINE  BOUNDARY  OF  AUTOMATED 
INFORMATION  SYSTEMS 


I 


PRODUCE  GIM 


FIGURE  5 
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5.3.1  Verify  the  LIMs. 


The  LIMs  are  organized  into  a  hierachy  of  organization- 
al components,  a  separate  but  interrelated  hierarchy  of 
functions,  and,  possibly^  a  separate  but  interrelated  hiera- 
chy of  events.  The  function  of  this  step  is  to  verify  that 
each  new  LIM  is  consistent  with  the  objectives  and  con- 
straints of  the  next  higher  level  LIM  in  its  hierarchy.  Any 
inconsistencies  require  modification  of  either  the  lower- 
level  LIM  or  the  higher-level  LIM.  In  the  latter  case, 
modifications  may  propagate  all  the  way  up  the  hierarchy  and 
possibly  affect  the  other  hierarchies  as  well:  such  modifi- 
cations may  also  propagate  to  the  GIM,  Conceptual  Schema, 
and  External  Schemas.  The  following  are  the  major  con- 
sider at ions : 


1.  The  data  flow  of  a  LIM  must  be  consistent  with  that 
of  its  higher-level  LIM.  Each  data  object  at  the 
lower  level  should  either  appear  at  the  higher  level, 
or  be  a  part  of  a  higher-level  data  object,  or  have 
both  source  and  sink  within  the  lower-level  LIMs. 
For  example,  assume  that  the  higher  level  is  a 
department,  and  the  lower  level  consists  of  the 
branches  within  it.  Data  received  by  one  branch  from 
an  outside  source  must  be  traceable  to  a  departmental 
data  source,  but  data  sent  to  another  branch  might 
not  appear  at  the  departmental  level. 

2.  Similarly,  the  data  flow  of  the  higher-level  LIM  must 
not  be  greater  than  the  data  flow  of  the  LIMs  that 
comprise  it. 

3.  More  generally,  the  scope  of  a  lower-level  LIM  must 
be 'consistent  with  the  scope  of  the  higher-level  LIM, 
where  scope  includes  such  non-data  considerations  as 
timing,  resources,  general  objectives,  and  interrela- 
tionships with  other  hierarchies.  For  example,  the 
branch  should  not  have  more  time  to  perform  a  task 
than  is  available  to  the  department,  and  should  not 
perform  functions  that  are  not  assigned  to  the 
d  epar  tment . 

4.  Similarly,  the  scope  of  the  higher-level  LIM  must  not 
be  greater  than  the  scope  of  the  LIMs  that  comprise 
it. 
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5.  If  workloads  have  been  developed,  the  workload  of  a 
LIM  must  be  consistent  with  that  of  its  higher-level 
LIM.  Data  volumes  should  be  consistent.  Each  path 
through  the  lower-level  data  must  either  be  entirely 
contained  within  the  lower-level  LIM  or  must  be 
traceable  to  a  path  in  the  higher-level  data.  Prior- 
ity, frequency,  timing  dependencies,  and  numbers  of 
instances  should  be  consistent. 

6.  Similarly,  all  of  the  paths  in  the  higher-level  LIM 
must  appear  in  the  lower-level  LIM. 


Step  5.3.1    Verify  the  LIMs 

Function:  To  verify  that  each  new  LIM  is 

consistent  with  the  objectives  and 

constraints  of  the  next  higher  level 

Output:  LIMs  organized  in  a  hierarchy  of 

organizational  components,  functions, 
or  events 


Team  Members 

Symbology: 
Tools: 


User  -  AA,  DA 
Developer  -  AA,  DA 

LIM  diagram 

Use  DD  to  change  entries  and  determine 
effects  of  change 


Guidelines:       Verify  LIMs  from  top  down 


5.3.2  Consolidate  LIMs. 

The  function  of  this  step  is  to  resolve  synonyms  that 
arise  when  different  subsystems  use  different  names  for  the 
same  data  flow  and  homonyms  that  arise  when  different  sub- 
systems use  the  same  name  for  different  data  flows.  Once 
detected,  synonyms  and  homonyms  are  relatively  easy  to 
resolve.  One  of  the  synonyms  is  chosen  for  the  GIM  name, 
while  the  others  are  retained  in  the  data  dictionary  as  al- 
ternate names  for  the  appropriate  LIMs.  For  example, 
"part#"  could  be  the  preferred,  global     name,     while  "part- 
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number"  could  be  used  within  the  context  of  a  particular 
function,  and  be  represented  in  the  data  dictionary  as  an 
alternate  name.  Only  one  object  can  be  assigned  the  homonym 
for  its  GIM  name;  each  of  the  other  objects  is  assigned  a 
new,  unique  name,  and  the  homonym  is  assigned  as  an  alter- 
nate name.  For  example,  if  "price"  refers  to  both  retail 
and  wholesale  price,  then  "price"  could  be  used  globally  to 
refer  to  retail  price,  or  locally  within  a  particular  func- 
tion to  refer  to  wholesale  price;  "wholesale-price"  could  be 
used  to  refer  to  wholesale  price  globally.  Alternatively, 
"retail-price"  and  "wholesale-price"  could  be  used  globally, 
and  "price"  only  locally. 

Detection  of  synonyms  is  largely  a  manual  process,  but 
there  are  some  clues  that  can  be  provided  by  the  DDS  or  oth- 
er computerized  tool: 


1.  The  primary  means  for  detecting  possible  synonyms  is 
data  flow  analysis,  which  can  be  performed  by  the 
DDS — for  example,  the  DDS  may  be  able  to  produce 
groups  of  data  objects  that  have  identical  sources 
and  sinks,  which  would  indicate  that  the  group 
members  could  be  the  same  data  object  with  different 
names  in  different  subsystems. 

2.  Name  analyses,  such  as  keyword  in  context,  are  useful 
for  suggesting  possible  synonyms. 

3.  Data  element  analysis  may  also  help  in  suggesting 
possible  synonyms  by  identifying  data  elements  that 
have  similar  characteristics,  such  as  their  COBOL 
pictures  or  legal  values. 


Detection  of  homonyms  should  be  primarily  a  process 
performed  by  the  DDS — the  DDS  should  reject  any  attempt  to 
add  conflicting  characteristics  to  any  data  object.  Situa- 
tions in  which  two  distinct  objects  have  the  same  names  and 
all  other  characteristics  must  be  detected  manually;  howev- 
er, if  each  object  has  a  meaningful  textual  description,  it 
is  relatively  simple  to  compare  descriptions  to  determine 
whether  they  should  be  combined,  or  should  be  given  separate 
names.  Homonyms  that  are  not  resolved  at  this  step  may  be 
resolved  at  a  later  step  or  later  iteration  of  this  step 
when  more  characteristics  are  known  and  therefore  there  is 
more  likelihood  of  a  conflict  being  detected  by  the  DDS. 
Resolution  at  this  step  is  a  convenience  but  not  a  necessi- 
ty. 
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step  5.3.2 


Consolidate  LIMs 


Function: 


Resolution  of  synonyms  and  honomyms 


Output : 


One  uniform  model 


Team  Member 


s : 


User  -  DA 
Developer  -  DA 


Symbology: 


Bubbles  and  lines 


Tool s: 


Use  DD  to  store  alternate  names 

Use  name  analyses  such  as  keyword  in 

context  to  detect  synonyms 


Guidelines:       Standardize  names  in  GIM 

Use  local  synonyms  whenever  appropriate 
in  LIMs 


5.3.3  Refine  Boundary  of  Automated  Information  System  (AIS). 

The  function  of  this  step  is  to  refine  the  boundary  of 
the  automated  information  system  that  is  being  designed. 
This  may  reduce  the  scope  of  the  logical  database  design  and 
therefore  reduce  the  effort  expended  in  subsequent  phases. 
Note  that  the  final  boundary  will  generally  be  three  dimen- 
sional: organizational  components,  functions,  and  events. 
They  must  all  be  included  in  or  excluded  from  the  logical 
database  design. 

The  criteria  for  drawing  the  boundary  are  primarily 
based  on  upper  management  goals  as  applied  by  the  DA  with 
possible  technical  advice  from  the  DBA. 

The  boundary  may  be  represented  on  a  data  flow  diagram 
by  a  line,  in  a  subsystem/data  matrix  by  highlighting  sub- 
systems within  the  boundary  or  omitting  subsystems  outside 
the  boundary,  and  in  the  data  dictionary  by  a  keyword  or  by 
relationships  between  a  specific  system  and  the  subsystems 
within  the  boundary. 
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step  5^.3^.2     Refine  Boundary  of  Automated  I nf ormat ion 
System  (AIS) 

Function:  Reduce  scope  and  refine  the  boundaries 

of   the  AIS 

Output:  Models  of   the  AIS 

Team  Members:   User  -  DA  and  upper  level  managers 
Developer  -  DA  and  DBA 

Symbology:         Bubbles  and  lines 

Tools:  Use  DD  to   represent  specific  system  and 

subsystems  within  the  boundary 

Guidelines:       Criteria  for  refining  boundary  are 
based  on  upper  management  goals 
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EXAMPLE  OF  A  GLOBAL  IN FORMAT ION -F LOW  MODEL 


/  \ 
/Organizational\ 
Component 

Providing  Data 
\      to  AIS  / 

\  / 


Bound  a  r  y- 


/  \ 
/Organizational\ 
Component  |<- 
Dependent  | 
\      on  AIS  / 
\  / 


/  \ 
/Organizational\ 
Component  |<- 
Interacting  | 
\      with  AIS  / 
\  / 


/  \ 
/Organi  zational\ 
"1  Component 
I  Independent 
\        of  AIS  / 

\  / 


/  \ 
/Organizationa  1\ 
Component  ! 
Interacting  I 
\      with  AIS  / 
\  / 


Automated 
Information 
System 
(AIS) 


/  \ 
/Organizational\ 
Component 
Interacting 
\      with  AIS  / 
\  / 


Figure  6 
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5.3.4  Produce  GIM. 


The  function  of  this  step  is  to  provide  additional 
quality  assurance  and  documentation  for  the  GIM.  Use  of  a 
data  dictionary  is  recommended.  Details  of  how  the  data 
dictionary  represents  the  GIM,  what  quality  assurance  re- 
ports are  provided,  and  what  documentation  is  to  be  produced 
must  be  determined  by  each  organization  to  suit  its  own 
capabil ities . 


Step  5.3.4     Produce  GIM 

Function:  Provide  final  review  and  documentation 


for   the  GIM 


Output : 


Specification  of  components  of  GIM 


Team  Members: 


User  -  DA  and  DBA 
Developer  -  DA  and  DBA 


Symbology: 


Bubbles  and  lines 


Tools: 


Use  DD  for  corrections 


Guidel ines: 


Quality  assurance  must  be  provided 
by  application  experts 
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6.     CONCEPTUAL  SCHEMA  DESIGN 


A  Conceptual  Schema  (CS)  is  a  description  of  the  logi- 
cal (hardware-  and  sof tware- independent )  structure  of  the 
data  required  by  an  organization.  The  phases  concerned  with 
development  of  the  Local  Information-flow  Models  (LIMs)  and 
Global  Information-flow  Model  (GIM)  concentrated  on  the  in- 
teractions between  data  and  organizations,  functions,  or 
events;  the  structure  and  meaning  of  the  data  were  not 
analyzed  beyond  the  relatively  simple  resolution  of  synonyms 
and  homonyms.  This  phase  concentrates  on  the  deep  explora- 
tion of  structure  and  meaning  in  terms  of  three  important 
concepts:  entity,  relationship,  and  attribute.  These  con- 
cepts correspond  very  closely  to  the  natural  language  con- 
structs of  noun,  verb,  and  adjective.  The  following  para- 
graphs, which  define  these  concepts  and  provide  brief  exam- 
ples, may  be  omitted  by  readers  familiar  with  the  Entity- 
Relationship-Attribute  Model    [CHEN80,  CHEN81,  CHEN82]. 


1.  An  entity  is  a  type  of  real-world  object  or  concept. 
For  example,  "employee,"  "project,"  and  "position 
description"  may  be  entities  of  interest  to  an  organ- 
ization. Note  that  only  "employee"  is  a  physical 
object — "project"  and  "position  description"  are  both 
concepts.  To  appreciate  the  difference,  consider 
that  a  "position  description"  may  be  recorded  on  a 
piece  of  paper.  If  the  paper  is  copied  or  reproduced 
electronically  in  a  database,  the  medium  is  changed, 
but  the  concept — the  position  description — is  still 
the  same.  Therefore,  the  entity  of  interest  is  the 
message,  not  the  medium. 

2.  A  relationship  is  a  type  of  association  or  correspon- 
dence among  entities.  For  example,  "works  on"  may  be 
a  relationship  between  "employee"  and  "project."  An 
instance  of  a  relationship  is  a  fact  or  assertion — 
e.g.,  the  phrase  ^"12345"  "works  on"  "design"'  could 
express  the  fact  that  the  "employee"  identified  by 
the  "employee  number"  "12345"  is  associated  with  the 
"project"  entity  identified  by  the  "project-name" 
"design"  through  the  relationship  "works  on."  This 
example  involves  two  entities  and  two  instances  of 
entities.  A  relationship  may  involve  only  one  enti- 
ty. For  example,  """design"  "precedes"  "implementa- 
tion"** is  a  relationship  involving  two  instances  of 
the  entity  "life-cycle  phase."  A  relationship  may 
also  involve  more  than  two  entities —  e.g.,  '"12345" 
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"works     on"     "design"     "using"  "Entity-Relationship- 
Attribute  Approach"'   is  an  instance  of  a  relationship 
("works     on"    "using")   among  three  entities  ("em- 

ployee," "project,"  and  "technique"). 

3.  An  attribute  is  a  property  or  characteristic  which 
describes  an  entity  or  relationship.  For  example, 
the  "employee"  entity  may  have  attributes  such  as 
"birth  date,"  "marital  status,"  and  "annual  salary," 
while  the  "works  on"  relationship  may  have  attributes 
such  as  "hours  per  week,"  or  "hours  to  date."  Every 
entity  must  have  an  attribute  or  collection  of  attri- 
butes that  distinguishes  among  entity  instances 
(e.g.,  an  "employee  number"  identifies  a  particular 
"employee").  A  relationship  may  be  without  attri- 
butes, since  each  instance  is  identified  by  the  enti- 
ties that  it  associates  (e.g.,  the  relationship  in- 
stance '"design"  "precedes"  "implementation"'  is 
uniquely  identified  by  "design"  and  "implementation," 
in  that  order )  . 


6.1     Information  Used  to  Develop  the  CS 


Most  of  the  information  that  is  relevant  to  the 
development  of  the  CS  is  provided  indirectly  by  the  GIM. 
Entities  are  the  subjects  of  the  data  flows  that  were  iden- 
tified by  the  GIM,  but  they  are  generally  not  the  data  flows 
themselves.  For  example,  a  personnel  report  is  not  an  enti- 
ty unless  there  is  system  for  tracking  the  production  or 
distribution  of  the  report,  in  which  case  each  instance  of 
the  report  might  be  identified  by  a  control  number.  The 
subjects  of  the  personnel  report,  e.g.,  "employee"  and  "pro- 
ject," would  be  entities. 


6.2     Functions  of  the  CS 


The  primary  function  of  the  CS  is  to  provide  a  single 
logical  structure  for  the  database.  Other  functions  in- 
clude : 


1.  The  CS  provides  input  to  the  External  Schema  Design 
Phase. 

2.  The  CS  provides  guidance  in  the  choice  of  a  data 
model  (e.g.,  either  a  hierarchical,  network,  or  rela- 
tional data  model  may  most  easily  represent  the  CS)  . 

3.  The  CS  provides  guidance  in    the     choice     of     a  DBMS 
(e.g.,  a  DBMS  that  easily  represents  the  CS) . 

4.  The  CS  provides  guidance  in  the  development  and 
evaluation  of  the  physical  database  design  (the  CS 
provides  the  definition  of  the  logical  data  structure 
that  the  physical  database  must  support). 


The  output  of  this  phase  may  include  the  following: 


1.  For  each  entity  of  fundamental  interest  to  the  organ- 
ization, its  name,  identifier  (key),  other  attri- 
butes, synonyms,  textual  description,  and  relation- 
ships with  other  entities. 

2.  Entity-Relationship-Attribute  diagrams  [CHEN82]. 

3.  Security,  privacy,  and  integrity  constraints. 

4.  Normalized  relations    [BEER79,  BERN76,  ZANI82]. 


6.3     Procedure  for  Developing  the  CS 


Figure  7  shows  the  six  steps  in  the  development  of  the 
CS .  The  last  step  may  reveal  redundancies  that  will  suggest 
repeating  some  or  all  of  the  preceding  steps.  The  steps  are 
described  in  the  following  paragraphs. 
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CONCEPTUAL  SCHEMA  (CS)  DESIGN  PROCEDURE 


Step  b. 3. 1 


Step  6.3.2 


Step  5.3.3 


Step  5.3.4 


Step  6.3.5 


Step  6.3.6 


LIST  ENTITIES  AND 
IDENTIFIERS 


I 


GENERATE  RELATIONSHIPS 
AMONG  ENTITIES 


I 


ADD  CONNECTIVITY  TO 
RELATIONSHIPS 


I 


ADD  ATTRIBUTES  TO 
ENTITIES 


I 


DEVELOP  ADDITIONAL 
DATA  CHARACTERISTICS 


I 


NORMALIZE  THE  COLLECTION 
OF  ENTITIES 


FIGURE  7 
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6.3.1  List  Entities  and  Identifiers. 


The  primary  function  of  this  step  is  to  develop  a  list 
of  entities  that  must  be  represented  in  the  CS .  Because  of 
the  inherent  complexity  of  the  real  world  that  the  CS 
models,  this  is  considerably  more  difficult  than  one  might 
assume.  Some  reasonable  guidelines  are  presented  below  and 
discussed  in  the  following  paragraphs. 


1.  A  data  flow  may  suggest  one  or  more  entities. 

2.  An  entity  must  have  a  meaningful     name    and  descrip- 
tion. 

3.  An  entity  must  have  an  identifier. 


In  general,  entities  are  the  subjects  of  the  GIM  data 
flows;  an  entry  in  a  report  or  form  is  usually  an  attribute 
which  can  identify  or  describe  an  entity.  For  example,  an 
assignment  matrix  could  have  "project#"  as  the  column  head- 
ing, "employee-number"  as  the  row  title,  and  an  "X"  or  blank 
as  an  indicator  of  assignment.  The  matrix  itself  is  not  an 
entity  in  most  cases,  but  the  "project#"  and  "employee- 
number"   identify  entities. 

An  entity  should  have  a  meaningful  name  consisting  of  a 
noun  or  noun  phrase.  If  there  is  no  obvious  choice  for  the 
name  of  a  proposed  entity,  then  it  is  likely  that  it  is  not 
an  entity.  In  addition,  the  entity  must  have  an  extended 
description  that  addresses  topics  such  as  the  lifetime  of  an 
entity  instance  (e.g.,  is  a  "dependent"  removed  from  the  da- 
tabase when  an  "employee"  resigns?)  and  criteria  for  inclu- 
sion (e.g.,  does  "employee"  include  both  hourly  and  salaried 
personnel?).  For  additional  guidance,  refer  to  [ATRE80, 
CHEN82,  CURT82,  KAHN79 ,  ROUSBl,  SHEP76,  SMIT78,  SUST83, 
TEOR8  2] . 

An  entity  must  have  one  or  more  identifiers  (or  keys). 
Each  identifier  is  an  attribute  or  combination  of  attributes 
which  distinguishes  among  entity  instances.  For  example, 
"employee-number,"  " pr oj ect-name, "  and  "PD#"  could  be  the 
identifiers  of  "employee,"  "project,"  and  "position  descrip- 
tion." The  identifier  of  an  entity  may  be  composed  of  iden- 
tifiers of  other  entities.  For  example,  the  identifier  of 
"assignment"  could  be  composed  of  the  combination  of  the  at- 
tributes "employee-number"  and  "proj ect-name. "  Note  that 
neither       single      attribute      would      uniquely     identify  a 
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particular  "assignment."  Note  also  that  "assignment"  could 
equally  well  be  identified  by  "SS#"  and  "project#,"  or  even 
by  a  unique  "assignment-number" — the  important  fact  at  this 
point  is  that  an  identifier  can  be  found,  so  that  "assign- 
ment"  is  a  legitimate  entity. 

Analysis  of  the  preceding  example  demonstrates  that 
care  must  be  exercised  in  finding  an  identifier  and  defining 
an  entity: 


o     If  the  "employee"    is  released  from  the  "project,"  is 
a  record  of  the  "assignment"  retained? 

o    If  so,  how  can  such  an    assignment     be  distinguished 
from  a  current  assignment? 

o    If  the  "employee"   is  returned  to     the     "project,"  is 
the  "assignment"   still  the  same? 


This  analysis  may  indicate  that  the  "employee-number" 
and  "pr oj ect-name"  cannot  constitute  the  identifier.  Anoth- 
er attribute,  such  as  " assignment-st ar ti ng-date-and- time , " 
may  be  needed  for  uniqueness.  Another  possibility  is  the 
"assignment-number;"  the  rules  for  handling  multiple  assign- 
ments could  then  be  represented  by  the  algorithm  for  deter- 
mining the  "assignment-number."  For  example,  if  the  first 
"assignment-number"  is  1,  and  each  succeeding  "assignment- 
number"  is  increased  by  1,  then  multiple  assignments  of  a 
given  "employee"  to  a  given  "assignment"  can  always  be  dis- 
tinguished . 

Entities  may  be  determined  "top-down"  by  abstracting 
from  the  data  flows  and  the  GIM,  or  "bottom-up"  by  syn- 
thesizing from  identifiers  and  their  attributes  [SHEP76]. 
The  latter  approach  is  greatly  simplified  by  the  use  of  a 
computer-based  normalization  program,  as  described  in  step 
6.3.6.  However,  "top-down"  is  recommended  because  it  forces 
the  developer  to  concentrate  on  the  semantic  characteristics 
of  the  data;  normalization  can  then  be  used  to  confirm  the 
d  esign . 
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step  6.3.1     List  entities  and  identifiers 


Function: 


Abstract  data  flows  to  determine 
enti  ties 


Output: 


List  of  entities  with  descriptions 
and  identifier 


Team  Members: 


User  -  DA  and  DBA 
Developer  -  DA  and  DBA 


Symbol ogy: 


Text 


Tool s: 


Use  DD  to  enter  entities  and  identifier 


Guidel ines: 


Be  careful  in  defining  an  entity 
and  finding  the  identifier  for  it 
Determine  entities  top-down 


6.3.2  Generate  Relationships  among  Entities. 

The  primary  function  of  this  step  is  to  examine  indivi- 
dual entities  to  see  whether  they  can  be  subdivided  into 
simpler,  related  entities,  and  to  examine  collections  of  en- 
tities to  see  whether  they  are  related  components  of  a  more 
complex  entity.  A  general  guideline  is  to  look  at  entities 
that  share  components.  For  example,  "employee"  and  "assign- 
ment" share  " employee- number ; "  obviously,  there  is  a  rela- 
tionship between  them.  The  data  dictionary  can  be  of  great 
help  in  comparing  entity  structures. 

The  following  are  examples  of  common  types  of  relation- 
ships fSUST83]: 


1.  Membership — a  collection  of  similar  secondary  enti- 
ties constitute  another,  primary,  entity.  The  fiscal 
years  in  a  five-year  plan,  the  quarters  in  a  fiscal 
year,  or  the  cities  in  a  state  are  examples  of 
membership  relationships.  The  relationship  between 
the  secondary  and  primary  can  be  expressed  by  "in," 
"of,"  or  "is  a  member  of."  The  identifier  of  the 
primary  may  be  required  to  identify  each  secondary; 
for  example,  a  city  name  may  be  ambiguous  unless  the 
state       is     identified.       The     primary    entity  would 
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include  properties  common  to  all  the  secondary  enti- 
ties, while  the  secondary  entities  would  have  unique 
properties . 

2.  Aggregation — a  collection  of  dissimilar  secondary  en- 
tities describes  another,  primary,  entity.  Generally 
all  primary  entities  are  related  to  similar  collec- 
tions of  secondary  entities.  For  example,  each  "em- 
ployee" is  described  by  the  aggregation  of  "address," 
"salary-history,"  "education,"  etc.,  which  are  them- 
selves entities.  The  relationship  between  the  secon- 
dary and  primary  can  be  expressed  by  the  phrase  "is  a 
property  of"  or  "is  a  part  of."  The  existence  of  a 
secondary  entity  is  usually  dependent  on  the  ex- 
istence of  the  primary  entity. 

3.  Generalization — each  of  a  collection  of  similar 
secondary  entities  can  be  considered  to  represent  a 
special  case  of  another,  primary,  entity.  Different 
primary  entities  may  be  related  to  different  types  of 
secondary  entities.  For  example,  "salaried-employee" 
and  "hourly-employee"  are  each  roles  of  the  primary 
entity  "employee."  The  relationship  between  the 
secondary  and  primary  can  be  expressed  by  the  phrases 
"is  a"  or  "is  a  type  of."  The  existence  of  each 
secondary  entity  may  be  dependent  on  the  existence  of 
the  primary  entity;  for  example,  every  "salaried- 
employee"  or  "hourly-employee"  must  also  be  an  "em- 
ployee." The  primary  entity  would  include  properties 
common  to  all  the  secondary  entities,  while  the 
secondary  entities  would  have  unique  properties. 


These  relationships  correspond  to  the  programming  con- 
structs of  iteration  (looping  through  the  members  of  a  col- 
lection) ,  sequence  (manipulating  one  after  another  of  the 
aggregated  properties),  and  selection  (determining  whether  a 
particular  role  is  played  by  the  entity) .  All  of  these  re- 
lationships can  be  developed  bottom-up  (from  a  given  collec- 
tion of  secondary  entities  to  the  primary),  to  produce  a 
simplified  high-level  structure,  or  top-down  (from  a  primary 
to  a  collection  of  secondaries),  to  add  more  detail. 

Another  type  of  relationship  which  is  occasionally  use- 
ful is  the  following: 
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4.  Precedence — the  existence  of  one  entity  in  the  data- 
base must  precede  the  existence  of  another  entity  in 
the  database.  For  example,  a  " pr oposed-budget"  must 
precede  an  "approved-budget;"  once  an  "approved- 
budget"  has  been  entered,  however,  its  existence  is 
independent  of  the  "pr oposed-budget . " 


Other,  more  specialized  relationships  are  discussed  in 
[SUST83] . 

Diagrams  are  recommended  as  a  convenient  way  of  commun- 
icating with  the  application  specialists.  Examples  are 
given  below. 


EXAMPLE  OF  AN  ENTITY-RELATIONSHIP  DIAGRAM 


El 


Relationship  name 


v 
v 


E2 


Figure  8 

This  example  states  that  entity  "El"  has  a  relationship 
with  another  entity  "E2."  The  single  and  double  arrows  in- 
dicate that  an  instance  of  "El"  may  be  associated  with  many 
instances  of  "E2,"  while  each  instance  of  "E2"  is  associated 
with  one  instance  of  "El." 
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ALTERNATE  NOTATION  FOR  AN  ENTITY- RELATIONSHIP  DIAGRAM 


El 


/  \ 
/  \ 
/  Rel  \ 
\Naine  / 
\  / 


E2 


Figure  9 

The  alternate  notation  is  somewhat  more  cumbersome  but 
it  does  have  the  advantage  of  emphasizing  the  importance  of 
relationships,  and  is  readily  extended  to  include  relation- 
ships among  more  than  two  entities  and  relationships  with 
attributes . 

In  general,  the  simplicity  of  labeled  lines  is  pre- 
ferred. A  relationship  among  more  than  two  entities  should 
usually  be  transformed  into  an  entity  which  has  simple  rela- 
tionships with  those  entities.     For  example. 
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REPLACING  A  RELATIONSHIP  WITH  AN  ENTITY 


rl 


V 
V 


r  3 


V 


Figure  10 


The  complex  relationship  R  has  been  replaced  by  an  en- 
tity; the  diamond  within  the  rectangle  indicates  that  R  may 
be  an  entity  on  one  diagram  and  a  relationship  on  a  less  de- 
tailed diagram.  New  relationships,  rl,  r2,  and  r3  must  be 
added  unless  they  are  obvious.  The  fact  that  an  "employee" 
uses  a  particular  "skill"  on  a  particular  "project"  would  be 
represented  by  such  a  diagram;  El,  E2 ,  and  E3  would 
represent  "employee,"  "skill,"  and  "project,"  while  R  could 
be  a  relationship  or  an  entity  identified  by  the  "employee," 
"skill,"  and  "project"  identifiers. 
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step  6.3.2 


Generate  relationships  among  entities 


Function: 


Revise  entities 


Output: 


Entities  and  relationships 


Team  Members: 


User  -  DA  and  DBA 
Developer  -  DA  and  DBA 


Symbol ogy: 


Entity-Relationship  diagrams 


Tools: 


Add  relationships  to  DD 


Guidelines: 


Look  for  common  types  of 
relationships 


6.3.3  Add  Connectivity  to  Relationships. 

The  primary  function  of  this  step  is  to  suggest  new  en- 
tities or  ways  in  which  entities  can  be  combined.  A  secon- 
dary function  is  to  provide  quantitative  data  useful  to  phy- 
sical database  design. 

Connectivity  describes  a  relationship  between  two 
entities —  how  many  instances  of  one  entity  are  associated 
with  how  many  instances  of  the  other  entity.  For  example, 
if  an  "employee"  can  have  only  one  "manager,"  but  a 
"manager"  can  manage  many  employees,  then  the  relationship 
"manages"  is  "1  to  many."  If  a  reasonably  good  number  can  be 
given  for  the  "many,"  that  may  assist  in  physical  database 
design.  However,  the  most  important  situations  for  logical 
database  design  are  the  following: 


o  Most  relationships  will  have  connectivity  "1  to  many" 
or  "many  to  1." 

o  If  the  connectivity  is  "1  to  1,"  then  the  two  enti- 
ties should  be  combined,  provided  that  the  result  can 
be  given  a  meaningful  name  and  description.  For  ex- 
ample, if  a  "project"  always  has  exactly  one 
"manager,"  and  a  "manager"  always  has  exactly  one 
"project,"     then     the     two    entities  can  be  combined. 


-69- 


(Note  the  use  of  the  vad  rd  "always."  In  the  real  world 
it  is  likely  that  there  will  be  periods  of  transition 
when  a  "manager"  has  no  "project,"  or  more  than  one 
"project,"  or  a  "project"  has  no  "manager."  In  reali- 
ty, then,  the  connectivity  might  be  "0,1  to  0,1,2," 
and  the  entities  should  not  be  combined.) 

o  If  the  connectivity  is  "1  to  0,1"  then  this  often  in- 
dicates generalization.  For  example,  the  relation- 
ship between  "employee"  and  "salaried-employee"  is  "1 
to  0,1,"  since  the  "employee"  could  be  an  "hourly- 
employee."  The  "salaried-employee"  entity  cannot  ex- 
ist unless  the  "employee"  entity  exists. 

o  If  the  connectivity  is  "many  to  many"  (or  numbers  in- 
dicating a  similar  situation) ,  then  the  relationship 
should  be  replaced  by  an  entity.  For  example,  if 
there  is  a  "many  to  many"  relationship  between  "em- 
ployee" and  "manager"  (i.e.,  matrix  management),  then 
a  new  entity,  such  as  "assignment  of  employee  to 
manager"  should  be  created,  and  the  "many  to  many" 
relationship  replaced  by  two  "1  to  many"  relation- 
ships. This  leads  to  more  entities  but  simplifies 
relationships  and  also  simplifies  the  mapping  of  the 
logical  database  design  into  a  conventional  data 
model. 


An  example  of  a    diagram    with    connectivity     is  shown 
be  1  ow . 
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EXAMPLE   OF  AN  ENTITY-RELATIONSHIP   DIAGRAM  WITH  CONNECTIVITY 


El 


"  1 
V 

V  many 
E2 


Figure  11 


Step  6^. 2*2  connectivity  to  relationsh ips 

Function:  Determine  connectivity  and  provide 

quantitative  data  to  physical 
database  design 

Output:  Annotated  relationships 

Team  Members:  User  -  DA  and  DBA 

Developer  -  DA  and  DBA 

Symbology:         Extended  E-R  diagrams 

Tools:  Add  connectivity  information  to  DD 

Guidelines:       Eliminate  1  to  1  and  many  to  many 
relationships 
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6.3.4  Add  Attributes  to  Entities. 

The  primary  function  of  this  step  is  to  add  detail  to 
the  entity  descriptions  in  the  data  dictionary  and  diagrams. 
Two  strategies  are  possible: 


1.  If  there  is  a  collection  of  known  attributes  (e.g., 
data  elements) ,  then  this  step  can  be  performed 
"bottom-up."  Each  attribute  is  assigned  to  an  entity 
(or  entities)  which  identifies  a  unique  instance  of 
that  attribute.  If  no  entity  is  appropriate,  one  is 
created,  relationships  are  developed,  and  so  on. 

2.  This  step  can  be  performed  "top-down"  by  examining 
each  entity  to  determine  appropriate  descriptors. 
This  procedure  is  recommended  during  high-level 
iterations,  when  attributes  are  data  collections 
rather  than  data  elements. 


The  attributes  are  represented  in  the  data  dictionary  by  be- 
ing "contained  in"  an  entity  [FIPS80]  ,  and  in  the  diagrams 
by  some  notation  such  as  that  in  the  following  example, 
where  "Al"   is  the  attribute: 
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EXAMPLE  OF  AN   ENTITY-RELATIONSHIP-ATTRIBUTE  DIAGRAM 


El 


/  \ 
Al  I 

\  / 


V  0,1 


E2 


Figure  12 

The  relationship  S  could  be  an  agreed-upon  symbol  to 
indicate  that  E2  is  a  subtype  of  the  entity  El. 

Another  function  of  this  step  is  to  simplify  the  CS  by 
eliminating  unnecessary  entities.  The  rule  for  doing  this 
is  very  simple: 


o  If  an  entity  is  single-valued  in  every  relationship 
with  other  entities,  then  it  can  be  eliminated  by 
moving  its  attributes  (including  the  identifier)  into 
those  entities. 


For  example,  suppose  that  "hourly-pay-scale"  is  an  en- 
tity with  the  attribute  and  identifier  "dollar-amount,"  and 
its  only  relationships  are  "many  to  1"  from  "salaried- 
employee"  and  "hourly-employee"  to  "hourly-pay-scale."  Then 
"dollar-amount"  should  be  assigned  to  "salaried-employee" 
and  "hourly-employee,"  and  "hourly-pay-scale"  should  be  el- 
iminated. The  justification  is  simple:  "dollar-amount"  is 
single-valued  in  every  relationship,  so  it  acts  like  a 
descriptor — i.e.,  an  attribute. 
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step  6.3.4     Add  attributes  to  entities 


Function: 


Add  attributes  to  the  entity 
descr  ipt ions 


Output : 


E-R-A  diagrams 


Team  Members: 


User  -  DA  and  DBA 
Developer  -  DA  and  DBA 


Symbology: 


E-R-A  diagrams 


Tools: 


Add  attributes  to  DD 


Guidelines: 


Simplify  by  eliminating  unnecessary 
en ti  ties 


6.3.5  Develop  Additional  Data  Characteristics. 

The  function  of  this  step  is  to  add  additional  con- 
straints, such  as  security  and  integrity,  to  the  entity  and 
relationship  descriptions  in  the  data  dictionary.  These 
constraints  are  important  but  are  not  easily  represented  on 
a  diagram;  the  recommendation  is  to  keep  the  diagrams  simple 
by    representing  these  constraints  only  in  the  data  diction- 


ary. 
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step  6.3.5^     Develop  additional  data  characteristics 


Function: 


Add  security,  integrity,  and  other 
constrai  nts 


Output : 


E-R-A  diagrams  and  updated  DD 
with  detailed  description  of  data 


Team  Members: 


User  -  DA  and  DBA 
Developer  -  DA  and  DBA 


Symbology: 


E-R-A  diagrams 


Tools: 


Add  constraints  to  DD 


Guidelines:       Keep  the  diagrams  simple 


6.3.6  Normalize  the  Collection  . 

The  primary  function  of  this  step  is  to  ensure  that  the 
collection  of  entities  is  optimal   in  the  following  sense: 


1.  Each  non-key  attribute  is  identified  only  by  the  sim- 
plest possible  identifiers.  For  example,  "supplier- 
address"  should  not  be  in  a  "supplier-part"  entity 
(identified  by  the  combination  of  "supplier-name"  and 
"part-number")  if  "supplier-address"  is  uniquely 
identified  by  "supplier-name"  alone. 

2.  Redundant  non-key  attributes  are  eliminated.  For  ex- 
ample, if  the  "branch"  entity  contains  "division#" 
and  "department!,"  and  the  "division"  entity  (identi- 
fied by  "division!")  also  contains  "department!," 
then  "department!"  can  be  eliminated  from  "branch." 
The  "department!"  can  be  determined  from  the  unique 
"division"  entity  identified  in  the  "branch,"  so 
"department!"   is  redundant  in  "branch." 

3.  Entities  with  the  same  identifier  are  combined. 

4.  Entities  with  equivalent  identifiers  (identifiers 
that  identify  each  other)   are  combined. 
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The  first  two  conditions,  plus  the  condition  that  attributes 
are  single-valued  (which  was  required  in  step  6.3.4),  are 
sufficient  to  ensure  that  the  entities  are  in  Third  Normal 
Form  [BERN76].  The  third  and  fourth  conditions  ensure  that 
the  entities  are  in  the  more  rigorous  Elementary  Key  Normal 
Form  (EKNF)  [ZANI82],  which  minimizes  the  total  number  of 
entities.  A  computer  algorithm  to  obtain  EKNF  is  described 
in  [BEER79 ,  BERN76];  the  proofs  of  correctness  and  minimali- 
ty are  complex,  but  the  algorithm  itself  is  quite  simple. 

Commercially  available  programs  perform  various  levels 
of  normalization  [MART77].  A  good  program  should  interface 
to  a  data  dictionary  to  obtain  identifiers  and  the  attri- 
butes that  they  identify,  and  should  provide  EKNF  as  well  as 
various  reports,  traces,  and  diagrams.  The  objective  of  the 
preceding  steps  of  this  phase  is  to  do  such  a  good  job  of 
identifier  analysis  that  the  normalization  program  will  pro- 
duce exactly  the  entities  that  are  input  to  it.  Experience 
indicates  that  discrepancies  between  the  input  and  output 
entities  are  often  caused  by  more  serious  and  subtle  errors 
than  those  found  by  the  normalization  program;  the  program 
exposes  errors,  but  its  "corrections"  are  sometimes  diffi- 
cult to  understand,  and  should  not  be  accepted  without 
thorough  analysis.  A  normalization  program  should  definite- 
ly not  be  used  as  a  substitute  for  careful  thought. 


Step  6^.2*^    Normalize  the  collection  of  entities 

Function:  Remove  redundancies  and  detect  errors 

Output:  Normalized  entities 

Team  Members:  User  -  System  analyst  and  DBA 
Developer  -  DA  and  DBA 

Tools:  Normalization  program 

Guidelines:       Careful  manual  analysis  as  well  as  use 
of  the  automated  tools 
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7.      EXTERNAL   SCHEMA  MODELING 


An  External  Schema  (ES)  is  a  subschema  (part)  of  a  Con- 
ceptual Schema  (CS)  that  is  relevant  to  a  Local 
Information- flow  Model  (LIM) .  A  LIM,  in  turn,  represents 
the  information  requirements  of  a  user,  group  of  users,  ap- 
plication program,  or  application  system.  An  ES  includes 
all  entities,  relationships,  and  attributes  needed  by  the 
LIM.  Local  names  are  possible — for  example,  the  Conceptual 
Schema  may  have  an  entity  called  "employee-number"  which  is 
"emp-no"  in  the  personnel  ES.  An  ES  reflects  the  way  infor- 
mation is  used  by  an  individual  task  or  decision. 


7.1     Information  Used  to  Develop  the  ES 


The  primary  sources  of  information  needed  to  develop  an 
ES  are  the  CS  and  the  relevant  LIM  as  represented  in  the 
data  dictionary.  If  the  LIM  is  inadequate  in  scope  or  de- 
tail, then  it  should  be  expanded  using  additional  informa- 
tion from  the  sources  listed  in  section  4.1. 


7.2     Functions  of  the  ES 


The  primary  function  of  an  ES  is  to  help  users  and  pro- 
grammers interact  with  the  database  by  presenting  a  simpli- 
fied view  of  the  database  in  terms  which  are  familiar  to 
them.     An  ES  has  the  following  secondary  functions: 


1.  Detailed  iterations  of  the  ES  provide  one  of  the  in- 
puts to  physical  database  design — they  describe  the 
workload,  originally  developed  in  terms  of  LIMs,  in 
terms  of  the  CS. 

2.  An  ES  is  a  piece  of  the  CS  which  can  be  assigned 
privacy  and  security  locks  during  physical  database 
design  and  implementation  phases. 

3.  An  ES  provides  quality  control  of  the  CS — if  the  ES 
cannot  be  constructed  from  the  CS ,  then  the  CS  is  in- 
complete. Also,  if  there  are  portions  of  the  CS 
which  are  not  required  by  any  ES,  then  those  portions 
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are  unnecessary  or  are  information  sources  that  are 
not  being  utilized  by  any  LIMs.  During  the  early 
iterations  of  the  logical  database  design  process  the 
ESs  will  be  useful  only  for  comparing  high-level 
descriptions  of  very  general  categories  of  data 
(e.g.,  data  needed  for  the  support  of  management  de- 
cisions) ,  since  the  relevant  LIMs  will  be  based  on  an 
organizational  perspective  and  will  not  have  much  de- 
tail. In  addition,  the  LIMs  may  not  indicate  what 
information  is  to  be  in  the  database  and  what  is  to 
be  provided  by  some  other  source.  During  later 
iterations,  the  ESs  will  provide  a  much  more  accurate 
means  for  ensuring  CS  quality. 


7.3     Procedure  for  Developing  the  ES 


Figure  13  shows  the  three  sequential  steps  in  the 
development  of  the  ES.  The  steps  are  described  in  the  fol- 
lowing paragraphs. 
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EXTERNAL  SCHEMA  [ES]  MODELING  PROCEDURE 


step  7.3. 1 


EXTRACT  ES  FROM 
CS 


Step  7.3.2 


DEVELOP  WORKLOAD  WITH 
RESPECT  TO  ES 


Step  7.3.3 


ADD  LOCAL  CONSTRAINTS 
TO  ES 


FIGURE  13 
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7.3.1  Extract  an  ES  from  the  CS . 


The  primary  function  of  this  step  is  to  decide  what 
parts  of  the  CS  are  required  by  a  particular  LIM.  First, 
data  flows  must  be  classified  into  those  requiring  data  from 
the  database  and  those  that  are  independent  of  the  database 
[JEFF82].  The  data  collection  may  be  obtained  from  or 
stored  in  a  private  file  or  other  non-database  location  if 
any  of  the  following  are  true: 


1.  The  data  collection  is  of  interest  to  only  a  single 
user  or  application  and  therefore  need  not  be  shared. 

2.  The  data  collection  is  transitory,  as  in  a  temporary 
working  file,  and  would  not  exist  long  enough  to  be 
relevant  to  other  users  or  applications. 

3.  The  data  collection  is  incomplete  or  inconsistent,  as 
in  a  partially  completed  update,  or  consists  only  of 
references  or  keys  to  other  data,  as  in  a  file  of 
references  to  data  of  particular  interest  to  decision 
suppor  t . 


In  general,  a  data  collection  should  be  obtained  from  or 
stored  into  the  database  if  all  of  the  following  are  true: 


1.  The  data  collection  is  of  interest  to  many    users  or 
applications  and  should  therefore  be  shared. 

2.  The  data  collection     is     sufficiently     long-lived  to 
have  many  uses. 

3.  The  data  collection  represents  a  consistent,  complete 
view  of  the  real  world. 


There     are     then    two     situations     that     can     be  dis- 
tinguished : 
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o  This  LIM  is  not  a  part  of  any  LIM  for  which  an  ES  has 
already  been  constructed.  For  example,  this  LIM 
might  be  a  top-level  organization,  function,  or 
event.  In  this  case,  the  ES  will  consist  of  high- 
level  entities,  relationships,  and  attributes  from 
the  CS .  If  a  Data  Dictionary  System  (DDS)  is  avail- 
able, it  should  be  employed  to  extract  only  high- 
level  data  objects.  These  objects  will  then  be  manu- 
ally compared  with  the  data  flows  of  the  LIM  to 
determine  what  parts  of  the  CS  are  needed  by  the  LIM. 

o  Alternatively,  this  LIM  is  a  part  of  a  higher-level 
LIM  for  which  an  ES  has  already  been  constructed. 
For  example,  this  LIM  may  be  a  part  of  a  function  for 
which  there  is  an  ES.  In  this  case,  the  ES  is  based 
on  the  higher-level  ES.  The  DDS  should  be  used  to 
extract  the  data  objects  relevant  to  the  higher-level 
ES,  and  the  lower-level  data  objects  which  are  con- 
tained within  them.  The  resulting  collection  of  data 
objects  must  then  be  compared  with  the  data  flows  of 
the  LIM  to  verify  that  all  data  required  by  the  LIM 
is  in  the  higher-level  ES,  or  is  a  part  of  some  data 
object  in  the  higher-level  ES  (the  DDS  can  greatly 
reduce  the  effort  involved  in  this  comparison).  If 
not,  the  higher-level  ES  must  be  extended  to  include 
the  missing  data.  The  lower-level  ES  will  then  con- 
sist of  the  relevant  parts  of  the  higher-level  ES 
plus  additional  entities,  relationships,  and  attri- 
butes required  by  the  more  detailed  level  of 
analysis . 

The  final  result  of  this  step  is  a  diagram  of  selected 
parts  of  the  CS  plus  additional  entries  in  the  data  diction- 
ary to  relate  the  selected  data  to  the  LIM. 
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step  7.3.1    Extract  an  ES  from  the  CS 


Function: 


Decompose  CS  based  upon  the 
particular  LIM 


Output: 


Decomposed  E-R-A  diagram 


Team  Members: 


User  -  Programmers,  analysts,  and  DBA 
Developer  -  DA  and  DBA 


Symbology: 


E-R-A  diagrams 


Tool s: 


Use  DD  to   relate  data  to  LIM 


Guidelines:       Verify  the  extracted  ES  with  LIM 


7.3.2  Develop  Workload  With  Respect  to  ESs. 

The  primary  function  of  this  step  is  to  translate  the 
workload,  originally  developed  in  terms  of  data  flow  in  the 
LIM,  into  data  access  and  update  in  the  ES.  The  preceding 
step  determined  what  parts  of  the  database,  if  any,  are  re- 
quired for  each  data  flow,  while  step  4.3.5  determined  the 
frequency,  sequence,  and  selectivity  with  which  each  func- 
tion uses  and  updates  data.  Therefore,  this  step  involves 
two  alternatives  for  each  data  collection  in  the  LIM  work- 
load sequence: 


o     If  the  data  collection  is     not     database    data,  then 
nothing  need  be  done. 

o  If  the  data  collection  is  database  data,  then  an  ap- 
propriate access  path  must  be  determined.  That  is, 
given  the  data  available  at  that  point  in  the  se- 
quence, what  entities  and  relationships  must  be  ac- 
cessed to  arrive  at  the  required  entities?  If  a  path 
cannot  be  found,  there  is  an  error,  which  must  be 
corrected  by  modifying  the  LIM  (e.g.,  by  revising  the 
workload),  modifying  the  partially  completed  ES 
(e.g.,  by  changing  the  distribution  of  database  and 
non-database  data),  or  modifying  the  CS  (e.g.,  by  ad- 
ding a  new  relationship).  If  a  path  can  be  found,  it 
is  added  into  the  workload  sequence  for  the  ES. 
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The  resulting  database  workload  should  be  represented 
in  the  data  dictionary  by  a  sequence  of  programs  or  modules 
interacting  with  the  database  objects.  Three  kinds  of  in- 
teractions with  entities  must  be  represented: 


o  Data  use — an  entity  instance  is  accessed  because 
various  attributes  are  needed  for  some  computation, 
report,  or  control  purpose. 

o    Data  update — an  entity  instance  is  added  or  modified. 

o  Data  access — an  entity  instance  is  part  of  a  path  but 
has  no  directly  relevant  attributes.  The  entity 
might  be  removed  from  the  path,  with  an  improvement 
in  database  performance,  if  the  Internal  Schema  has 
an  appropriate  relationship  to  bypass  the  entity. 


As  noted  in  step  number  4.3.5,  there  are  two  types  of  in- 
teractions with  attributes: 


o  Entity  retrieval  —  an  attribute  is  needed  to  determine 
whether  an  entity  instance  is  needed  by  the  function. 

o  Attribute  selection — an  attribute  instance  is  re- 
quired for  a  computation,  report,  control,  or  update 
purpose. 


There  is  one  type  of  interaction  with  relationships: 


Path  component — the  relationship  is  part  of  a  path. 
Note  that  the  direction  is  important. 


The  paths  may  also  be  represented  graphically  by  an  overlay 
on  an  ES  or  CS  diagram  [MART84,  MCCL84,  SUST84].  This  pro- 
vides a  simple  representation  that  can  be  easily  understood 
and  verified  by  application  specialists,  but  is  not  a  sub- 
stitute for  the  data  dictionary. 
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step  7.3^.2^    Develop  workload  with  resp ec t  to  ES 
Function:  Specifications  for  physical  design 

Output:  Workload  specifications 

Team  Members:  User  -  Programmers,  analysts,  and  DBA 
Developer  -  Analysts,  DA  and  DBA 

Symbology:         E-R-A  diagram  with  path  overlay 

Tools:  Update  DD  to  add  workload  information 

Guidelines:       Identify  access  path  to  avoid  errors 


7.3.3  Add  Local  Constraints  to  the  ES. 

The  purpose  of  this  step  is  to  add  any  unique  con- 
straints imposed  on  or  by  the  LIM.  Examples  of  such  con- 
straints include  security  and  privacy  restrictions,  local 
rules  for  edit  and  validation,  and  local  integrity  con- 
str  aints . 


Step  7.2-2  local  constraints  to  the  ES 

Function:  Add  local  constraints  to  each  ES 

Output:  Updated  E-R-A  diagrams  and  updated  DD 

Team  Members:  User  -  Programmers,  analysts,  and  DBA 
Developer  -  DA  and  DBA 

Symbology:         E-R-A  diagrams 

Tools:  Update  DD  to  add  constraints 

Guidelines:       Identify  unique  constraints  imposed 
on  or  by  the  LIM 
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8.  CONCLUSIONS 


This  report  presents  a  Logical  Database    Design  metho- 
dology with  the  following  characteristics: 


o  There  are  four  phases:  Local  I nf ormat ion- f low  Model- 
ing, Global  Information-flow  Modeling,  Conceptual 
Schema  Design,  and  External  Schema  Modeling. 

o  The  phases  are  executed  iteratively  to  control  com- 
plexity and  to  provide  a  means  for  verifying  the 
results  of  the  different  phases  against  one  another. 

o  Analysis  is  performed  from  different  points  of  view 
(organization,  function,  and  event)  in  order  to  en- 
sure that  the  logical  database  design  accurately  re- 
flects all  reasonable  information  requirements  of  the 
organi  zation. 

o  The  methodology  recommends  computer  support  from  a 
Data  Dictionary  System,  in  order  to  conveniently  and 
accurately  handle  the  volume  and  complexity  of  design 
documentation  and  analysis,  and  to  provide  ready  ac- 
cess to  work  already  accomplished. 

o  Logical  database  design  is  integrated  into  the  com- 
plete system  life  cycle. 


The  purpose  of  this  methodology  is  to  assist  in  the 
design  of  very  large  and  complex  information  systems,  where 
the  effects  of  poor  logical  database  structures  can  result 
in  expensive,  time-consuming  system  development  efforts 
whose  end  results  are  ineffective  and  inefficient.  The 
methodology  emphasizes  both  the  need  for  speed,  so  that  the 
design  will  be  completed  in  time  to  be  useful,  and  the  need 
for  quality  control,  to  ensure  that  the  design  is  con- 
sistent, complete,  and  satisfies  the  eventual  users. 
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APPENDIX  A 


Agency  Financial  Management  System 


I 

Ij 


1 


INTRODUCTION 


A  Federal  agency  is  designing  a  financial  management 
system.  None  of  the  applications  systems  offered  by 
software  vendors  seem  to  gracefully  accommodate  the  agency^ s 
code  structure  and  its  cost  accounting  procedures  for  its 
reimbursable  divisions.  As  a  matter  of  fact,  although  the 
individuals  on  the  team  surveying  these  packages  are  each 
expert  in  a  particular  subject  area,  they  lack  a  good  over- 
view of  what  their  agency's  requirements  are,  or  should  be. 

A  primary  objective  of  the  design  effort  is  to  gain  an 
organizational  perspective  of  the  agency's  financial  data. 
The  logical  database  design  can  then  be  used  to  develop  a 
system  (either  in-house  or  on  contract),  purchase  a  system 
(once  requirements  are  understood)  or  specify  modifications 
which  would  be  needed  if  a  system  were  purchased  from  a  ven- 
dor or  obtained  from  another  agency. 

An  important  consideration  in  the  logical  database 
design  project  is  that  the  agency's  appropriation  from 
Congress  constitutes  only  63%  of  the  operating  budget.  Ad- 
ditional income  is  provided  by  contracts  with  other  govern- 
ment agencies  and  the  sale  of  goods  and  services  to  the  pub- 
lic sector.  The  financial  management  system  must  be  able  to 
charge  back  costs  to  customers.  Another  important  con- 
sideration is  that  there  is  an  existing  payroll  system  which 
must  interface  with  the  financial  management  system. 

An  example  of  a  reimbursable  division  is  Instrument  Fa- 
brication Division,  IFD,  whose  income  from  services  to  other 
government  agencies  represents  8%  of  the  agency's  budget. 
IFD  relies  on  other  divisions  within  the  agency  for  func- 
tions such  as  procurement  and  accounting.  IFD  finances  all 
management  and  support  services  by  applying  a  fixed-rate 
surcharge  to  the  labor  base  in  some  of  its  own  units. 

The  following  examples  are  intended  to  show  some  of  the 
types  of  documentation  which  are  gathered  or  produced  in  a 
logical  database  design. 

These  examples  have  been  simplified  so  that  the  amount 
of  detail  does  not  obscure  the  intent  of  the  example.  How- 
ever, in  some  instances  enough  detail  is  left  in  so  that  the 
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reader  may  appreciate  the  sheer  volume  of  the  items  of  in- 
formation to  be  gathered,  analyzed  and  organized  in  logical 
database  design.  The  result  is,  unfortunately,  an  uneven 
level  of  detail. 

Even  the  sample  system  chosen,  "Agency  Financial 
Management  System,"  is  limited  in  scope,  showing  some  as- 
pects of  normal  in-house  financial  management  for  a 
service-oriented  agency.  Other  federal  agencies,  whose  mis- 
sion is  to  administer  or  disburse  government  funds,  would 
consider  this  example  system  a  minor  subsystem.  In  general, 
logical  database  design  for  financial  management  should  con- 
sider the  unique  mission  of  the  agency  and  the  extent  to 
which  financial  data  can  be  used  to  support  that  mission. 
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INSTRUMENT  FABRICATION  DIVISION 
Oi^gaiiizational  Qiart 


MANAGEMENT 


ESTIMATES 


DESIGN 


OPERATIONS 


MANUFACTURING 


CALIBRATIONS 


MISSION 


The  mission  of  instrument  Faorlcatlon  Division  Is  to  design  ana  manufacture  nign-preclsloa 
one-of-a  kind  instruments  in  support  of  the  agency's  scientific  research  divisions.  This 
service  is  avail^le  to  other  government  agencies  as  well  as  the  public.  All  instruments 
are  manufactured  on  a  reimbursable  basis. 
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INSTRUMENT  FABRICATION  DIVISION 
High  Level  Local  Information-flow  Model 


CUSTOMER 


Plans 
Orders 
C<m  tracts 


Labor  hours 
distribution 
Billing  informrtion 
Purchase  order 


paynent 

authorization 

j  ACXXXMTING 

1 

'  Accounting  reports 

Estlnates 

Design  specifications 
Status  r^orts 


INSTRUMENT 
FABRICATION 
DIVISION 


Tine  cards 


PAYROLL 


Purchase  order 
receiving  reports 


Quotes  on' 
naterials  and 
equifment 


Requisition  for 
naterials  and 
equipnent 


SHIPPING 

AND 
RECEIVING 


VEMXDR 


PROCUREMENT 
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INSTRUMENT  FABRICATION  DIVISION 
Local  Information-flow  Model 
ESTIMATES  Unit 


CUSTOMER 


CostVtine 
estlnates 


Plans 


Quotes  on 
naterials  prices 


ESTIMATES 


VENDCR 


Cost/tine 
estinates 

Plans 

Purchase  order 
Infornation  t 


Ldbor  rates 


MANAGEMEhJT 


NOTES 

Estimates  are  free  to  customers.  The  ESTIMATES  unit  is  not  reimtxjrsed  direcUy 
for  services. 
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INSTRUMENT  FABRICATION  DIVISION 

Local  Information-flow  Model 
OPERATIONS  Unit 


hWJAGEMENT 


-  Approved  plans 

'  Priority  list 

•  Design 

specifications 

DESIGN 

•  Hflterials 

list 

OPERATIONS 


Task  plans 


MANUFACTURING 


Project  plans 
Progress  reports 
Project/enployee  hours  sunnary 
Tifte  cards 

Equipnent  requisitions 
naterlals  purchase  orders 


Enployee/project  tine  cards 
Task  status 
llaterials  usage  log 


CALIBRATIONS 


NOTES 

OPERATIONS  is  responsible  for  coordinating  the  efforts  of  MANUFACTURING  and 
CALIBF^TIONS,  scheduling  tasks,  ordering  materials  and  equipment,  reporting 
material  and  labor  spent  on  each  project 
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INSTRUMENT  FABRICATION  DIVISION 
Local  Information-flow  Model 

Function  :  Close  Out  Work  Order 


■%'////  naterials  '//M// 
■/A//,,:   W/A 


'^9^.  tiork  order  '^/y/^// 

■  m 


Notice  of 
conpletion 


Itenized 

bill 
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AGENCY  FINANCIAL  MANAGEMENT  SYSTEM 
Global  Information-flow  Model 


PUBLIC 
SECTOR 


TREASURY 
DEPARTMOn" 


BUDGET 
OFFICE 


PAYROLL 
OPERATIONS 


Financial 
reports 
Schedule  of 
paynents 


Payroll 
reports/t^ 


Invoices 


VEHDORS 


Paynents 


OTHER 
AGQiCIES 


Authorizations 
Ad  hoc 
requests 


Bills 


ACCOUNTING 


Purchase 
orders 


.  Labor  Hours 
-  Billing  info 
•  Paynent 
authorization 


Accounting 
reports 


Status  reports 


REIMBURSABLE 
OPERATIOKS 


Accounting 

r^rts 


Re^iisitions 


Obligations 


Labor  tours 
Payment 
outtwrization 


PROCURIMElfr 


t 


Requisitions 


APPROPRIATED 
TECHNICAL 
OPERATICWS 


Boundajiy  of  Autc«atic«i 
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AGENCY  FINANCIAL  MANAGEMENT  SYSTEM 

ENTITY-RELATIONSHIP  DIAGRAM 
OF  CONCEPTUAL  SCHEMA 


NOTES: 


Ncai-key  attributes  are  not  showi. 

Vaita.  dictiOTiary  reports  list  all  attriljutes. 
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AGENCY  FINANCIAL  MANAGEMENT  SYSTEM 
EXTERNAL  SCHEMA 

Function  :  Close  Out  Work  Order 


^DIV.  ID^ 


lOVEMEAD 


pDIV.  ID!#- 


iCOST 


DIVISICN 


1  r  O,* 


LABOR 
CATEGORY 


PURCHASE 
ORDER 


0,m 


«oescr2 


0,m 
— ^ 


0,1 




u.o.  #^ 


FIXED/  i 
ACTUAL' 


WORK 
ORDER 


0.1 


.0,1 


CUST.  ID 


VORK 
ORDER 
TASK 


PROFIT/ 

LOSS 
TRANSFER 


PROJECT 
TIME 
CARD 


%CUST.  ID 


:  DIV.  ID# 


U.OrBflL|p 


NOTE  :     Entities,  relationships  and  attributes  not  used  by  this  functic«i  are  riot 
shown.    Coiiiplete  detadls  are  available  frc«i  the  data  dictic«»ry. 
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EXTERNAL  SCHEMA  "OVERLAY" 
WORKLOAD  FOR  FUNCTION 

"Close  Out  Work  Order" 
Biweekly  Statistics  for  All  Reimbursable  Divisions 


UPDATE 

KEY  =  U.O.  # 


DIV.ID,  CUST.ID. 

EST. COST,  FIXED/ACTUAL 


13 

HABOR  CATEGORY  ^ 

USE 

KEY  =  OIV.IO 

USE 

KEY  =  DIV. ID 

DIV  OVERHEAD  RATE 

LABOR  CODE 
LABOR  RATE 

^  .  ,  0. 

4 

1  PROJECT  TDIE  CARD^, 

ACCESS 

KEY  =  U.O.  » 

USE 

KEY  =  U.O.  #  ♦  TASK  # 

TASK  ff 

T.C.  HOURS 
LABOR  CODE 

1.5 


PURCHASE  ORDER 

^  P.O.  LIME  ITB1|ri?7.^;:*T^^ 

ACCESS 

7 

USE 

KEY  =  U.O.  # 

KEY  =  P.O.  # 

P.O.  # 

COST 

.25 


(^TE 

KEY  =  DIV. ID  ♦  U.O.  # 


U.O.BAL 


.75 


iC.I.  LINE  ITB1 





UPDATE 

KEY  =  CUST.IO  *  U.O.  # 


OIV.IO 
U.O.BAL 


LEGEND 


FREgUENCY 


ENTITY  NflHE 


SEO 


entity  USE.  UPDATE  or  ACCESS 
KEY  =  access  key 


List  of  other  attributes 
used  by  the  process 


1 
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DATA  DICTIONARY  DISPLAY 

WORKLOAD  FOR  FUNCTION 


F1012-CLOSE-OUT-WORK-ORDER  MODULE 

 CLASSIFICATION  CATEGORY  

10 

DB-PROCESS 

 DESCRIPTION  CATEGORY  

10 

-  TRIGGERED  BY  RECEIPT  OF  CLOSE-OUT  TICKET 

20 

-  COMPUTES  FINAL  COST  OF  WORK  ORDER 

30 

-  TRANSMITS  WORK  ORDER  BALANCE    (ADVANCE  PAYMENT 

40 

MINUS  COST)   TO  COST  ACCOUNTING  AS  EITHER  A 

50 

PROFIT/LOSS    (FOR  FIXED-PRICE)    OR  CUSTOMER  REFUND/ 

60 

AMOUNT-DUE  TRANSACTION    (FOR  ACTUAL- PR ICE ) . 

 RELATIONAL  CATEGORY  

CATALOGUE  NAME 

1010 

F1012-WORK-ORDER 

ACCESS:   TYPE=UPDATES                          FREQ"  56 

56 


P1012 

-WORK-ORDER  MODULE 

 RELATIONAL  CATEGORY- 

CATALOGUE  NAME 

10 

WORK -ORDER-NUMBER 

ACCESS:  TYPE=READS 

20 

DIVISION-ID 

ACCESS:  TYPE=READS 

30 

CUSTOMER- ID 

ACCESS:  TYPE=READS 

40 

W-O-ESTIMATED-COST 

ACCESS:  TYPE=READS 

50 

W-O- ACTUAL-COST 

ACCESS:  TYPE=<:rEATES 

60 

W-O-FIX ED- ACTUAL- INDICATOR 

ACCESS:  TYPE=READS 

70 

W-O- DATE-C  OMPLET  ED 

ACCESS:  TyPE=CREATES 

10 10 

F1012-DIVISION 

ACCESS:  TYPE=READS 

FREQ- 

1 

10  20 

FIO 12-WORK-ORDER-TASK 

ACCESS:  TYPE=READS 

FREQ- 

6 

10  30 

F1012-PURCHASE-ORDER 

ACCESS:  TYPE=READS 

FREQ=  1 

50 

1040 

F 10 1 2 -PROFI T- LOSS-TRAN  SF  ER 

ACCESS:  TYPE=CREATES 

FREQ« 

.25 

1041 

*     EXECUTE  FOR  FIXED-PRICE  WORK 

ORDER 

10  50 

F 10  12-CUST- INVOICE-LINE- ITEM 

ACCESS:  TYPE=CREATES 

FREQ= 

.7  5 

1051 

*     EXECUTE  FOR  ACTUAL-PRICE  WORK  ORDER 

F1012 

-WORK-ORDER-TASK  MODULE 

 CLASSIFICATION  CATEGORY  

10 

ACCESS-ONLY 

 RELATIONAL  CATEGORY  

CATALOGUE  NAME 

10 

WORK-ORDER-NUMBER 

ACCESS:  TYPE=READS 

20 

TASK -NUMBER 

ACCESS:  TYPE=READS 

1010 

FIO  12-PROJBCT- TIME-CARD 

ACCESS:    TYPE=READS                              FREQ  = 

4 

F1012-PROJECT-TIME-CARD  MODULE 
 RELATIONAL  CATEGORY  


CATALOGUE  NAME 

10 

WORK -ORDER-NUMBER 

ACCESS:  TYPE=READS 

20 

TASK-NUMBER 

ACCESS:  TYPE=READS 

30 

TIME- CARD -HOURS 

ACCESS :    TY  PE  =READS 

40 

DIV-LABOR-CODE 

ACCESS:  TYPE=READS 

.25 


F1012-PROFIT-LOSS-TRANSFER  MODULE 
 RELATIONAL  CATEGORY  

CATALOGUE  NAME 

10  DIVISION-ID 

ACCESS:  TYPE-CREATES 
20  WORK-ORDER-NUMBER 

ACCESS:  TYPE=<REATES 
30         W-O- BALANCE 

ACCESS:  TYPE=<REATES 
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INDENTED  INDEX 
EXTERNAL       SCHEMA       FOR  FUNCTION 

F1012-CLOSE-OUT-WORK-ORDER 


RELATIVE  LEVEL/DATA  CATALOGUE  NAME 

ENTRY  TYPE 

PAGE 

F 10 12-CLOSE-OUT-WORK-ORDER 

MODULE 

2 

.   FIO 12-WORK-ORDER 

MODULE 

3 

WORK  -ORDPR— NTTMRFR 

FT.RMENT 

XJ  XI  Xla*l  X-ii^  X 

4 

•I 

Pr.F.MPNT 

CUSTOMER- ID 

ELEMENT 

6 

PT.PMPNT 

7 

XjXJ  XiU.  i  XJI^  X 

W-O-FIXED-ACTUAL-INDICATOR 

ELEMENT 

9 

•        9       "      ^      LJ  t\  X  Jj          Wl  1  ST  LJ  lit  X  XJJL/ 

FT.PMPNT 

XjXj  Xil'lX-il^  X 

1  0 

X  Vx 

.    .   ^10 12-DIVISION 

MODULE 

X  1  XX  XX  W  XJ  XJ 

11 

.    .    .  DIVISION-ID 

ELEMENT 

12 

.    .    .    D IV- OVERHEAD- RATE 

ELEMENT 

XJ  XJ  I  ii  IXjLl  X 

13 

.    .    .   FIO 12-D IV- LABOR-CATEGORY 

MODULE 

14 

.    .    .    .  DIVISION-ID 

ELEMENT 

15 

.    .    .    .    D IV- LABOR-CODE 

ELEMENT 

Xj  XJ  1  ^1  AXJL^  Xi 

16 

.    .    .    .    D IV- LABOR- RATE 

ELEMENT 

17 

.    .   ^10 12-WORK-ORDER-TASK 

«             •           —            w    iJIb  Am          w  1              VAi  A          X^X  VX^  XJ  X  \          X.  X  X         X  V 

MODULE 

18 

.    .    .    WORK -ORDER-NUMBER 

•          •          •         T  V  XX  X  VX\        X^XxXX  X^  XX       X  V  \J  X  X^JX^  Xx 

ELEMENT 

X-iXJ  1  M   lXJJ-«  ^ 

19 

.    .    .   TASK -NUMBER 

ELEMENT 

20 

.    .    .   FIO 12-PROJECT-TIME-CARD 

MODULE 

21 

.    .    .    .   WORK -ORDER- NUMBER 

•          •          •          •         T  «  ^/XXX\        \-'  X  XXa/       XX       X^  V_/  1.  X       *—t  X  X 

ELEMENT 

22 

.    .    .    .   TASK -NUMBER 

•             •             •             %            ^  X  X  kX  1,X         i.^  XX  X  A  IlJ  *~t  X  X 

ELEMENT 

23 

.    .    .    .  TIME-CARD-HOURS 

*          a          •          •          X^LX  X-J            XX  XXX^        XXXX  W  Xx^x 

ELEMENT 

XJ  ■  Jl    A  XJX  1  ^ 

24 

nTV-T.AROR-rOnF 

•        •        •        •            X  V       XJxxxJVxJLx  \^\JLJl-t 

ELEMENT 

25 

FIO 12 -PURCHASE- ORDER 

»          9         X          x/  ^  A>       X   W  XxV^  XX  XXkXX^       X.^XXJ.X  X.i(XX 

MODULE 

X  X  V,/  A^  XX  '  '  ^ 

26 

.    .    .   WORK -ORDER-NUMBER 

•           •           •          V  1  XXX  XX  \         XXX  XXX  XJ  XX              XX  J-  X  XX 

ELEMENT 

27 

PTTRCHASF- ORnPR— NTTMRFR 

PT.EMENT 

28 

^  VJ 

MODTTT.E 

1  1      L/  \J  Xj  XJ 

29 

PTIRrHASK-ORnPR— NTTMRRR 

•        •        •        •       IT  \J  £yx^  LlntJl-J      xxXxLxi_jXx           U  1*1  XxXj  Xx 

ELEMENT 

30 

•       •        •        •      X       \y     XJ  X  1^  xli      XX  i-il  1     \^  X 

ELEMENT 

*  *  '  '  '  «i  '  1  X_JL^  J. 

31 

MODTIT.R 

11 XX  Xx  \J  XJ  i-i 

32 

r>T\7T  QTHN— Tn 

•       •       •      JLyXVXoXVyLN  XU 

Pr.PMPNT 

XjXJ  Xlil'lXlil^  X 

33 

.    ,    ,   WORK -ORDER-NUMBER 

ELEMENT 

34 

.    .    .   W-0- BALANCE 

ELEMENT 

35 

.    ,   FIO 12-CUST-INVO ICE-LINE-ITEM 

MODULE 

36 

.    .    .   CUSTOMER- ID 

ELEMENT 

37 

.    .    .   WORK -ORDER-NUMBER 

ELEMENT 

38 

.    .    .  DIVISION-ID 

ELEMENT 

39 

.    .    .   W-0- BALANCE 

ELEMENT 

40 

***  END       OF  INDEX 
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